Bugs item #1743433, was opened at 2007-06-26 11:55
Message generated for change (Comment added) made by jvrantwijk
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1743433&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF general
Group: Pathfinder 0.18
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Joris van Rantwijk (jvrantwijk)
Assigned to: Jan Flokstra (jflokstra)
Summary: XQ: problem with ampersand in attribute value
Initial Comment:
XML attribute values containing an ampersand give unexpected query results. In
the query response, the ampersand is apparently escaped twice into &
Example document aap.xml:
<aap>
This is ok: &
But this is bad: <noot q="&" />
</aap>
The XQuery doc("aap.xml") now returns:
<aap>
This is ok: &
But this is bad: <noot q="&#38;"/>
</aap>
Note that the problem only occurs with shredded documents, not with nodes
embedded in the XQuery. This suggests that the bug is somewhere in the shredder.
I enabled the debug statements in shred_start_element in shredder.c. This
showed that val at that point already contains "&" (I was expecting "&").
----------------------------------------------------------------------
>Comment By: Joris van Rantwijk (jvrantwijk)
Date: 2007-07-06 18:24
Message:
Logged In: YES
user_id=1480851
Originator: YES
Hello Jan,
I found a relevant libxml2 bug report:
http://bugzilla.gnome.org/show_bug.cgi?id=172638
The libxml2 people seem to believe that & is a perfectly reasonable
thing to do unless you explicitly set XML_PARSE_NOENT. I disagree, but I
think it unlikely that they would change this.
I tried your é case, and in fact it seems that this libxml2 bug is
now fixed.
When I run it against libxml2 2.6.20 it gives me two eacutes, but when I
run it against libxml2 2.6.27 it works as expected.
Joris.
----------------------------------------------------------------------
Comment By: Jan Flokstra (jflokstra)
Date: 2007-07-04 10:23
Message:
Logged In: YES
user_id=1054297
Originator: NO
I have been trying the XML_PARSE_NOENT option before when using user
defined entities. But it caused very strange behaviour. It doubled the
output for user defined entities. Try this old entity bug (there was
regression on entities_dtd.SF-1642663 and entities_dtd.SF-1642665:-)
example with the XML_PARSE_NOENT option:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo (bar)>
<!ELEMENT bar (#PCDATA)>
<!ENTITY eacute "é" ><!-- small e, acute accent -->
]>
<foo>
<bar>want to see a nice e with a acute accent:é</bar>
</foo>
You will see that is emits TWO eacutes here.
I am still convinced there is no easy way to solve this problem with the
shredder. Indeed was the example not exactly your problem but it was very
closely related. It also only happens with the & entity. All other
entities are handled correct. Maybe we should post a libxml2 bug report for
this (how?)
----------------------------------------------------------------------
Comment By: Joris van Rantwijk (jvrantwijk)
Date: 2007-07-03 17:12
Message:
Logged In: YES
user_id=1480851
Originator: YES
I agree that libxml is throroughly messed up in this respect. But I'm not
convinced that the discussion you refer to is about exactly the same issue.
I found an option in libxml2 which seems to make it behave better.
Patch attached, it fixes the ampersand issue and apparently does not cause
regression on test cases.
File Added: pathfinder_ampersand_fix.diff
----------------------------------------------------------------------
Comment By: Jan Flokstra (jflokstra)
Date: 2007-07-03 14:05
Message:
Logged In: YES
user_id=1054297
Originator: NO
This seems to be a known bug in libxml software. I googled on this and
found for instance this discussion in a debian forum:
http://bugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=397395
I cannot do anything to fix it in the shredder. We just have to wait for
the fix in libxml2.
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2007-07-03 12:07
Message:
Logged In: YES
user_id=591107
Originator: NO
renamed bug with XQ: in title
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2007-07-03 11:59
Message:
Logged In: YES
user_id=591107
Originator: NO
shredder, entities.. so assigning to Jan
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1743433&group_id=56967
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs