Bugs item #1116471, was opened at 2005-02-04 13:06 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1116471&group_id=16035
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Steve Carter (cart33) Assigned to: Maarten Coene (maartenc) Summary: Problem with XPath and retrieving text Initial Comment: I have a Junit test similar to the following: public void test() { fiinal String XML = "<a><b>Water T & D-46816</b></a>"; final String XPATH = "a/b/text()"; final String EXPECTED_VALUE = "Water T & D-46816"; XPath xpathObj = createXpathObject(XPATH ); Document doc = createDocument(XML ); Object node = xpathObj.selectSingleNode(doc); if (node instanceof Text) { result = ((Text) node).getText(); } assertEquals(EXPECTED_VALUE, result)); } which fails because getText() only returns: Water T interrogating the node object returned from selectSingleNode indicates that the expected result is present as 3 seperate text elements in the content (ArrayList) member variable I can retrieve the value if I tweak the approach to use: final String XPATH = "a/b"; if (node instanceof Element) { return (String) ((Element) node).getData(); } If i dont have entity references then the first approach always works. Therefore this seems to be a bug, please correct me if i am wrong. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-13 16:01 Message: Logged In: NO good work ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-13 15:45 Message: Logged In: NO good website ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-13 15:28 Message: Logged In: NO cool:)) ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-13 13:32 Message: Logged In: NO The History of Parliament is a major academic project to create a scholarly reference work describing the members, constituencies and activities of the Parliament of England and the United Kingdom. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-13 10:53 Message: Logged In: NO <a href="{link1}">{text1}</a>, <a href="{link2}">{text2}</a> ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-12 17:07 Message: Logged In: NO History of the United States of America. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-12 17:07 Message: Logged In: NO History of the United States of America. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-11 16:36 Message: Logged In: NO Chinese food, is a unique, tasty and very common cuisine which usually consists of two main ingredients. The first being a carbohydrate source such as rice or noodles. The second component that is used in chinese food can be vegetables, fish or meat. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2007-01-10 11:11 Message: Logged In: NO Enjoy wines from top-rated wine producers around the world from Australia to Germany to South Africa. A great way to sample a variety of wines from major wine regions. A GREAT GIFT for wine lovers. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-12-30 03:48 Message: Logged In: NO These crazy bitches are ready to do anything from deepthroat to DP to milk a fat cock or two at once: <a href="http://www.porn-active.info/quebfreeamatteengirlsgett.html">quebec free amateur teen girls getting fucked</a>. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-12-29 06:54 Message: Logged In: NO Are you infatuated with cum-addicted <a href="http://www.porn-and-sex.info/realchubbyamatporntrail.html">real chubby amateur porn trailers</a> chicks? Here <a href="http://www.porn-and-sex.info/realfreelinksvid.html">real free links video amateur porn movies</a>, <a href="http://www.porn-and-sex.info/realamatsexamat613.html">real amatuer sex, amateur</a> you’ll find as many cock-starving whores as it’s only possible. Watch these dick-smokers getting enormous throbbing poles deep in their mouths. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-12-23 02:02 Message: Logged In: NO They are yummy quick-learners getting it in every hole and riding teachers' huge cocks. <a href="http://www.fuck-teen-princesses.info/">naked sitting on face</a>Freaky teen school sluts are ready to do anything for extra mark! ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-11-10 06:39 Message: Logged In: NO The widest collection of the best porn in the Internet for any taste! <a href="http://www.pornoerotica-xxx.com/bigcockpen.html">big cock penetration</a>Every nyche is loaded with hours of the HOT PORN! ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-10-15 10:41 Message: Logged In: NO Super sexy girls <a href="http://www.europe-xxx.info/europandgermandsexthurmbn.html">europeans and germans and sex thurmbnails</a> wait for your attention. Recommended <a href="http://www.europe-xxx.info/europyounggirlsworkout.html">european young girls workout gallery</a> by me ))) and this: <a href="http://www.europe-xxx.info/holsexviol.html">holiday sex violence and the european dream</a> ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-10-12 04:07 Message: Logged In: NO Erotic and the night action!!!<a href="http://www.amateur-fuck.info/drunkamatgirlspanty.html">drunk amateur girls panty</a> AMATEUR VIDS: <a href="http://www.amateur-fuck.info/cuptitsass.html">cup tits ass wife amateur</a> and <a href="http://www.amateur-fuck.info/eatamateurwomen.html">eat amateur women locker room</a>!! ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-04-20 22:31 Message: Logged In: NO Hi To write the letter, it is necessary ... ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2006-04-04 16:38 Message: Logged In: NO I want mp3 player. What will advise? ---------------------------------------------------------------------- Comment By: Lukas Theussl (lukas_theussl) Date: 2006-04-03 16:55 Message: Logged In: YES user_id=1301221 Hi Maarten, I have just re-built Maven-1.1 using dom4j from the DOM4J_1_X_BRANCH and together with jaxen-1.1-beta-8, it seems to solve the problems that I reported at http://jira.codehaus.org/browse/JAXEN-67 ! This is great news for us, as upgrading dom4j and jaxen has been a long-standing blocker in Maven (see http://jira.codehaus.org/browse/MAVEN-1345). I still have to do some more thorough testing, but is there any chance that we could have a stable release soon with this fix included? Thanks! -Lukas ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2006-03-24 14:05 Message: Logged In: YES user_id=178745 Bazza, I've modified SAXContentHandler to also merge the CDATA sections if you set mergeAdjacantText to true. Could you please try again with the version from CVS? (branch DOM4J_1_X_BRANCH) thanks Maarten ---------------------------------------------------------------------- Comment By: Victor (kromo) Date: 2006-02-21 16:55 Message: Logged In: YES user_id=1156663 I encountered the same problem. In my case there were no entities but a buffer boundary which created mismatches. I used something like //serialNumber/text() to collect all serial numbers but one of these was splitted into to separated text() nodes. It makes a big difference eg. if one has something like "count(//serialNumber)" or "count(//serialNumber/text())" because these two number may be not equal even if <serialNumber> contains only PCDATA. Buffer boundaries should have no influence on the model. ---------------------------------------------------------------------- Comment By: Bazza (bazzargh) Date: 2005-12-22 04:05 Message: Logged In: YES user_id=1005507 (came here from a related bug report filed against jaxen, see http://jira.codehaus.org/browse/JAXEN-67 ) Maarten, I think there's a legitimate bug here: /any/xpath/text() should only return multiple nodes for mixed content, not just when there are entities present. eg: <a>this<b>has</b>two</a> Should return 2 for count(/a/text()); and with mixed content the stringValue of '/a' is not the same as '/a/text()' (referring to your workaround above) <a>this hasn't one</a> should return 1 for the same expression (going by the xpath spec). Also: <a>this <![CDATA[has]]> one</a> Should return 1. People using xpath with dom4j need to use normalize() to work around this whenever node() or text() appear in their expressions. Unfortunately the 'setMergeAdjacentText' method at parse time, which would appear to 'pre-normalize' the tree, doesn't. In SAXContentHandler (copying and pasting from my comments on JAXEN-67 ): inside 'characters()', this code: } else if (insideCDATASection) { if (mergeAdjacentText && textInTextBuffer) { completeCurrentTextNode(); } cdataText.append(new String(ch, start, end)); } else { ... means that even if you've asked it to merge adjacent text nodes, it goes ahead and builds cdata nodes; which it then adds without checking the 'mergeAdjacentText' flag: public void endCDATA() throws SAXException { insideCDATASection = false; currentElement.addCDATA(cdataText.toString()); } To my mind, these should read, respectively: } else if (insideCDATASection && !mergeAdjacentText) { cdataText.append(new String(ch, start, end)); } else { ... public void endCDATA() throws SAXException { // you'd want this condition around the code in startCDATA too. if (!mergeAdjacentText) { insideCDATASection = false; currentElement.addCDATA(cdataText.toString()); } } This would make 'mergeAdjacentText' normalize as it goes, which I'm guessing was the desired behaviour? ---------------------------------------------------------------------- Comment By: Michael Pichler (mpichler) Date: 2005-12-16 05:31 Message: Logged In: YES user_id=613551 I stand corrected. The spec says that adjacent Text nodes should be merged automatically. Thus the normalize() call is a workaround (but at least, it should work). ---------------------------------------------------------------------- Comment By: Michael Pichler (mpichler) Date: 2005-12-16 05:25 Message: Logged In: YES user_id=613551 Hi, I think this is perfectly normal. There are multiple text() children which may be addressed separately with xpaths containing indices (see bug 1374352). Your problem is that selectSingleNode() only selects the first matching text child, and it seems you should call normalize() on the root element first to "merge" adjacent Text nodes before any further processings. regards, Michael Pichler ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-04-13 19:14 Message: Logged In: NO I have just hit this bug (in production of course ;-). Originally using dom4j 1.4, but still present using dom4j 1.5.2, jaxen 1.0FCS. My xpath is of the form "//a/b[text()="value"]/..". This failed in one case because the 'b' element has been parsed into two 'text' nodes. It seems it crossed some buffer boundary in the parsing stage, as the two text values are "TINBICS_SECOND" and "ARY_FEC" (i.e. just normal text). In our other test cases this has been parsed as a single text node. I verified the arbitrary splitting by adding spaces earlier in the file, and the position of the split moved accordingly "TINBICS_SEC" and "ONDARY_FEC". Replacing the xpath with "//a[b="value"]" solved the problem, so this seems to be a problem with using "text()" in the xpath. The xpath spec says there should never be two adjacent text nodes. http://www.w3.org/TR/xpath#section-Text-Nodes Second, the xpath spec says that 'text()' should select all text nodes. http://www.w3.org/TR/xpath#path-abbrev I'm not sure if dom4j is "at fault", but it sure would be nice if it could at least be resilient to the problem. :-) Andrew. ---------------------------------------------------------------------- Comment By: Steve Carter (cart33) Date: 2005-02-12 20:06 Message: Logged In: YES user_id=597933 Thanks for the explanation. Greatly appreciated. I have not made myself familiar with the specification so I appreciate your insight. It just seemed intuitive to me that selectSingleNode() would return the full value of the node whether references were present or not. Feel free to close this issue and pursue it as an enhancement as there are many approaches to satisfy the solution. I enjoy using your api and thanks again for the help. ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2005-02-12 07:04 Message: Logged In: YES user_id=178745 On the other hand, I see now in the 5.7 of the XPath spec that a text node shouldn't have immediately following siblings that are text nodes themselfs, so this could be a bug indeed. I'll investigate this further... regards, Maarten ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2005-02-12 06:57 Message: Logged In: YES user_id=178745 I don't think this is a bug. The following happened: expression "a/b/text()" selects all text nodes of <b>. Because you have an entity reference in it, the SAX parser you have used did create 3 text nodes: "Water T ", "&" and " D-46816". The selectSingleNode() method returns the first node: "Water T ". So this is correct. expression "a/b" selects all <b> elements. If you apply the string function to it, you will retrieve the string-value of the <b> element. This expression should do the trick: "string(a/b[1])", as illustrated by the example below: String xml = "<a><b>Water T & D-46816</b></a>"; Document doc = DocumentHelper.parseText(xml); String result = (String) doc.selectObject("string(a/b[1])"); now, result is equal to "Water T & D-46816" Another way is to retrieve the node and ask for the string-value directly on the node: Node node = doc.selectSingleNode("a/b"); String result = node.getStringValue(); I hope this helped you out. If you still feel this is a bug, please tell me otherwise I'll close this issue. regards, Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-02-10 09:49 Message: Logged In: NO This problem affects other xpath query types sch as /a/b/* etc... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1116471&group_id=16035 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ dom4j-dev mailing list dom4j-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dom4j-dev