Bugs item #1185932, was opened at 2005-04-19 15:01
Message generated for change (Comment added) made by stmane
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1185932&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/loader
Group: Pathfinder CVS Head
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Wouter Alink (vzzzbx)
Assigned to: Nobody/Anonymous (nobody)
Summary: XML: Entities
Initial Comment:
Hello,
I was trying to load the INEX collection into
MonetDB/XQuery... but I soon ran into problems. I'm
using the head of MonetDB and today's pathfinder. I
don't know the status of MonetDB/XQuert with regards to
xml-entity-support.
First of all, i was wondering which entities are
declared $amp; >, etc by default? Is there a list of
these entities somewhere? (the only thing i could find
in the xml-spec was that was not part of xml)
Then, after defining some entities for myself I did the
following:
the document a1003.xml is the document attached.
the document a1003b.xml is equal to a1003.xml, but
without the line <!ENTITY mdash "d">.
The first problem is the fact that the entity ‐
has been defined in a1003b.xml and in my opinion should
not be mentioned as an error (see the output below).
(the problem disappears in a1003.xml)
The second problem is that when shredding a 'correct'
document, an error is returned (!ERROR:
interpret_params: insert(param 1): invalid BAT. ), but
obviously it did store the xml-doc.
Besides these errors it might be nice to be able to
include a document in another one (this mechanism is
often used in the INEX collection):
<!ENTITY A1003 SYSTEM "a1003.xml">
Or is this already possible?
Grtz,
Wouter
-------------------------
output:
MonetDB>shred_doc("/ufs/alink/test-data/a1003b.xml","a1003.xml");
/ufs/alink/test-data/a1003b.xml:55: error: Entity
'mdash' not defined
y, let me welcome a number of new article editors to
our Editorial Board —
^
/ufs/alink/test-data/a1003b.xml:55: error: Entity
'hyphen' not defined
cotty, Keith Smillie, James Cortada, and Tim Bergin
have joined our long‐
^
/ufs/alink/test-data/a1003b.xml:55: error: Entity
'hyphen' not defined
Rosin, Brian Randell, Arthur Burks, Bernard Galler,
and Martin Campbell‐
^
/ufs/alink/test-data/a1003b.xml:55: error: Entity
'mdash' not defined
of the <it>Annals</it>. We also welcome aboard a new
Production Manager —
^
!ERROR: XML input not well-formed.
!ERROR: CMDshred2bats: operation failed.
MonetDB>shred_doc("/ufs/alink/test-data/a1003.xml","a1003.xml");
!ERROR: interpret_params: insert(param 1): invalid BAT.
MonetDB>shred_doc("/ufs/alink/test-data/a1003.xml","a1003.xml");
!ERROR: xmlshred: Document a1003.xml already exists
MonetDB>
----------------------------------------------------------------------
>Comment By: Stefan Manegold (stmane)
Date: 2007-01-23 18:06
Message:
Logged In: YES
user_id=572415
Originator: NO
Might be related to
bug #1642663 "XQ: shredding with inline DTD and ENTITIES does not work"
http://sourceforge.net/tracker/index.php?func=detail&aid=1642663&group_id=56967&atid=482468
and/or
bug #1642665 "XQ: shredding DTDs with ENTITIES erroneous"
http://sourceforge.net/tracker/index.php?func=detail&aid=1642665&group_id=56967&atid=482468
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2006-11-13 12:42
Message:
Logged In: YES
user_id=572415
see also BUG #1544002 "PF: several tests fail after recent
checkins" at
http://sourceforge.net/tracker/index.php?func=detail&aid=1544002&group_id=56967&atid=482468
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2006-11-13 11:28
Message:
Logged In: YES
user_id=43607
The test belonging to this bug now fail, and they fail
correctly. There are two xml documents that go with this
test, entities.xml and entities-invalid.xml. Contrary to
what the naming suggests, *both* documents are invalid, and
the shredder, correctly, complains about that.
The problem with the purportedly valid document is that it
has an internal DTD which does not actually describe the
document.
The question is, how to fix this. We can just approve the
output, or we can change the test so that it has a DTD that
matches the document.
----------------------------------------------------------------------
Comment By: Niels Nes (nielsnes)
Date: 2005-10-06 20:38
Message:
Logged In: YES
user_id=43556
BugDay_2005-10-06: Claimed by niels
BugDay_2005-10-06: TEST / SUCCESS
Test added as
tests/BugDay_2005-10-06_4.9.3/Tests/Entities.SF-1185932
----------------------------------------------------------------------
Comment By: Wouter Alink (vzzzbx)
Date: 2005-05-23 10:59
Message:
Logged In: YES
user_id=621590
The returned error on a mal-formed document reads:
MonetDB>shred_doc("/ufs/alink/test-data/a1003b.xml","a1003b.xml");
!ERROR: XML input not well-formed.
!ERROR: CMDshred2bats: operation failed.
in my opinion this solves the bug
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2005-04-28 13:19
Message:
Logged In: YES
user_id=43607
Have the problems been resolved? If so, please close the
bug report.
----------------------------------------------------------------------
Comment By: Wouter Alink (vzzzbx)
Date: 2005-04-20 12:48
Message:
Logged In: YES
user_id=621590
Thanks for helping out. I think I know where the second bug
stems from: I didn't delete my dbfarm after I installed the
new MonetDB with the changed interal format. I tought it
converted the bats automatically, but apparently it didn't.
It now does not happen anymore. (thanx go to peter).
I guess this bug-report has been boiled down to: too many
errors are returned when shredding with undefined entities.
Shall i post a feature request for the
include-a-whole-document-entity, or should we just wait?
----------------------------------------------------------------------
Comment By: Jens Teubner (teubner)
Date: 2005-04-19 16:32
Message:
Logged In: YES
user_id=731390
I've tested the file (a1003.xml) you attached, and it loads
fine on my system.
You write that the same document, with the entity declaration
for mdash removed produces the errors you describe. I've
tested that as well and can confirm your error messages.
After removing the entity declaration, however, your document
in fact becomes invalid, so it is correct that shred_doc()
rejects the document. The unexpected behavior here is that
shred_doc() seems to complain about the hyphen entity as
well, although that one *is* defined.
I agree that the error messages in that case are a bit
misleading. However, they seem to be produced by the libxml
library that we use. I've checked with a small test program
that produces the exact same error messages. To me, this
appears like a bug in libxml2. (I'm surprised, though, that
xmllint reports errors correctly.)
Concerning your question about including other XML
documents (<!ENTITY A1003 SYSTEM "a1003.xml">):
I've tested that as well. Pathfinder currently does *not*
support that. Again, we depend on the libxml2 library. And
my simple SAX test program already couldn't handle these
includes. I've looked through the libxml2 API, and I suspect
that we hit the ``note that the use of this function for
unparsed entities may generate problems'' (API doc) issue
here. So I don't think we can handle this functionality in the
near future. (Again, xmllint can handle this. xmllint uses
libxml2's DOM mode, maybe this is why xmllint behaves
differently.)
What is left is your `interpret_params: insert(param 1):
invalid BAT' problem. I couldn't reproduce that one here.
What versions of MonetDB and Pathfinders do you use?
Did you initialize MonetDB's database with the same
versions or are they leftovers from older versions?
----------------------------------------------------------------------
Comment By: Jens Teubner (teubner)
Date: 2005-04-19 15:25
Message:
Logged In: YES
user_id=731390
Wouter,
thanks for your bug report.
Let me first address the question in your bug report about predefined
entities in XML. It is a common misunderstanding that all the entities
known from HTML are also valid in XML. XML predefines only five entities
(see also http://www.w3.org/TR/REC-xml/#sec-predefined-ent):
< <
> >
& &
' '
" "
Anything else that you want to use must be declared in your DTD.
I'll now start looking into the code what Pathfinder actually does about
XML
entities. DTD support has been introduced quite recently; probably the
support is not complete, yet.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1185932&group_id=56967
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs