Bugs item #1743433, was opened at 2007-06-26 11:55
Message generated for change (Settings changed) made by sjoerd
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1743433&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF general
Group: Pathfinder 0.18
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Joris van Rantwijk (jvrantwijk)
>Assigned to: Sjoerd Mullender (sjoerd)
Summary: XQ: problem with ampersand in attribute value
Initial Comment:
XML attribute values containing an ampersand give unexpected query results. In
the query response, the ampersand is apparently escaped twice into &
Example document aap.xml:
<aap>
This is ok: &
But this is bad: <noot q="&" />
</aap>
The XQuery doc("aap.xml") now returns:
<aap>
This is ok: &
But this is bad: <noot q="&#38;"/>
</aap>
Note that the problem only occurs with shredded documents, not with nodes
embedded in the XQuery. This suggests that the bug is somewhere in the shredder.
I enabled the debug statements in shred_start_element in shredder.c. This
showed that val at that point already contains "&" (I was expecting "&").
----------------------------------------------------------------------
>Comment By: Sjoerd Mullender (sjoerd)
Date: 2007-08-03 10:43
Message:
Logged In: YES
user_id=43607
Originator: NO
Last night's testing showed the bug is fixed.
----------------------------------------------------------------------
Comment By: Sjoerd Mullender (sjoerd)
Date: 2007-08-02 14:14
Message:
Logged In: YES
user_id=43607
Originator: NO
It seems that attribute values are returned as character entities, so I
added code to translate those to UTF-8. I also added a test, so now we
will see whether this fix fixes the problem on all platforms.
The test is
/pathfinder/tests/BugTracker/Tests/attr-entity-bug.SF-1743433.*
I'll leave this open until after tonight's testing.
----------------------------------------------------------------------
Comment By: Joris van Rantwijk (jvrantwijk)
Date: 2007-07-26 12:18
Message:
Logged In: YES
user_id=1480851
Originator: YES
Please note that the reported bug does not (as far as I can tell) depend
on the libxml2 version.
It is rather the proposed resolution (XML_PARSE_NOENT) which is known to
break an existing testcase (entities_dtd.SF-1642663 according to Jan
Flokstra) on some versions of libxml2.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2007-07-26 11:00
Message:
Logged In: YES
user_id=572415
Originator: NO
Anybody interested in / volunteering to add(ing) a proper test for this
one to CVS?
Our nightly testing pool offers several version of libxml2 (2.6.10,
2.6.16, 2.6.27, 2.6.28, 2.6.29), and coudl hence provide a simple check,
whether this bug indeed depeds on the libxml2 version, respectively in with
version it might be fixed ...
----------------------------------------------------------------------
Comment By: Fabian (mr-meltdown)
Date: 2007-07-26 10:39
Message:
Logged In: YES
user_id=963970
Originator: NO
The problem appears to be the same with dev-libs/libxml2-2.6.29
----------------------------------------------------------------------
Comment By: Joris van Rantwijk (jvrantwijk)
Date: 2007-07-06 18:24
Message:
Logged In: YES
user_id=1480851
Originator: YES
Hello Jan,
I found a relevant libxml2 bug report:
http://bugzilla.gnome.org/show_bug.cgi?id=172638
The libxml2 people seem to believe that & is a perfectly reasonable
thing to do unless you explicitly set XML_PARSE_NOENT. I disagree, but I
think it unlikely that they would change this.
I tried your é case, and in fact it seems that this libxml2 bug is
now fixed.
When I run it against libxml2 2.6.20 it gives me two eacutes, but when I
run it against libxml2 2.6.27 it works as expected.
Joris.
----------------------------------------------------------------------
Comment By: Jan Flokstra (jflokstra)
Date: 2007-07-04 10:23
Message:
Logged In: YES
user_id=1054297
Originator: NO
I have been trying the XML_PARSE_NOENT option before when using user
defined entities. But it caused very strange behaviour. It doubled the
output for user defined entities. Try this old entity bug (there was
regression on entities_dtd.SF-1642663 and entities_dtd.SF-1642665:-)
example with the XML_PARSE_NOENT option:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo (bar)>
<!ELEMENT bar (#PCDATA)>
<!ENTITY eacute "é" ><!-- small e, acute accent -->
]>
<foo>
<bar>want to see a nice e with a acute accent:é</bar>
</foo>
You will see that is emits TWO eacutes here.
I am still convinced there is no easy way to solve this problem with the
shredder. Indeed was the example not exactly your problem but it was very
closely related. It also only happens with the & entity. All other
entities are handled correct. Maybe we should post a libxml2 bug report for
this (how?)
----------------------------------------------------------------------
Comment By: Joris van Rantwijk (jvrantwijk)
Date: 2007-07-03 17:12
Message:
Logged In: YES
user_id=1480851
Originator: YES
I agree that libxml is throroughly messed up in this respect. But I'm not
convinced that the discussion you refer to is about exactly the same issue.
I found an option in libxml2 which seems to make it behave better.
Patch attached, it fixes the ampersand issue and apparently does not cause
regression on test cases.
File Added: pathfinder_ampersand_fix.diff
----------------------------------------------------------------------
Comment By: Jan Flokstra (jflokstra)
Date: 2007-07-03 14:05
Message:
Logged In: YES
user_id=1054297
Originator: NO
This seems to be a known bug in libxml software. I googled on this and
found for instance this discussion in a debian forum:
http://bugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=397395
I cannot do anything to fix it in the shredder. We just have to wait for
the fix in libxml2.
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2007-07-03 12:07
Message:
Logged In: YES
user_id=591107
Originator: NO
renamed bug with XQ: in title
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2007-07-03 11:59
Message:
Logged In: YES
user_id=591107
Originator: NO
shredder, entities.. so assigning to Jan
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=1743433&group_id=56967
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs