Bugs item #1116471, was opened at 2005-02-04 22:06 Message generated for change (Comment added) made by mpichler You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1116471&group_id=16035
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Steve Carter (cart33) Assigned to: Maarten Coene (maartenc) Summary: Problem with XPath and retrieving text Initial Comment: I have a Junit test similar to the following: public void test() { fiinal String XML = "<a><b>Water T & D-46816</b></a>"; final String XPATH = "a/b/text()"; final String EXPECTED_VALUE = "Water T & D-46816"; XPath xpathObj = createXpathObject(XPATH ); Document doc = createDocument(XML ); Object node = xpathObj.selectSingleNode(doc); if (node instanceof Text) { result = ((Text) node).getText(); } assertEquals(EXPECTED_VALUE, result)); } which fails because getText() only returns: Water T interrogating the node object returned from selectSingleNode indicates that the expected result is present as 3 seperate text elements in the content (ArrayList) member variable I can retrieve the value if I tweak the approach to use: final String XPATH = "a/b"; if (node instanceof Element) { return (String) ((Element) node).getData(); } If i dont have entity references then the first approach always works. Therefore this seems to be a bug, please correct me if i am wrong. ---------------------------------------------------------------------- Comment By: Michael Pichler (mpichler) Date: 2005-12-16 14:25 Message: Logged In: YES user_id=613551 Hi, I think this is perfectly normal. There are multiple text() children which may be addressed separately with xpaths containing indices (see bug 1374352). Your problem is that selectSingleNode() only selects the first matching text child, and it seems you should call normalize() on the root element first to "merge" adjacent Text nodes before any further processings. regards, Michael Pichler ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-04-14 04:14 Message: Logged In: NO I have just hit this bug (in production of course ;-). Originally using dom4j 1.4, but still present using dom4j 1.5.2, jaxen 1.0FCS. My xpath is of the form "//a/b[text()="value"]/..". This failed in one case because the 'b' element has been parsed into two 'text' nodes. It seems it crossed some buffer boundary in the parsing stage, as the two text values are "TINBICS_SECOND" and "ARY_FEC" (i.e. just normal text). In our other test cases this has been parsed as a single text node. I verified the arbitrary splitting by adding spaces earlier in the file, and the position of the split moved accordingly "TINBICS_SEC" and "ONDARY_FEC". Replacing the xpath with "//a[b="value"]" solved the problem, so this seems to be a problem with using "text()" in the xpath. The xpath spec says there should never be two adjacent text nodes. http://www.w3.org/TR/xpath#section-Text-Nodes Second, the xpath spec says that 'text()' should select all text nodes. http://www.w3.org/TR/xpath#path-abbrev I'm not sure if dom4j is "at fault", but it sure would be nice if it could at least be resilient to the problem. :-) Andrew. ---------------------------------------------------------------------- Comment By: Steve Carter (cart33) Date: 2005-02-13 05:06 Message: Logged In: YES user_id=597933 Thanks for the explanation. Greatly appreciated. I have not made myself familiar with the specification so I appreciate your insight. It just seemed intuitive to me that selectSingleNode() would return the full value of the node whether references were present or not. Feel free to close this issue and pursue it as an enhancement as there are many approaches to satisfy the solution. I enjoy using your api and thanks again for the help. ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2005-02-12 16:04 Message: Logged In: YES user_id=178745 On the other hand, I see now in the 5.7 of the XPath spec that a text node shouldn't have immediately following siblings that are text nodes themselfs, so this could be a bug indeed. I'll investigate this further... regards, Maarten ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2005-02-12 15:57 Message: Logged In: YES user_id=178745 I don't think this is a bug. The following happened: expression "a/b/text()" selects all text nodes of <b>. Because you have an entity reference in it, the SAX parser you have used did create 3 text nodes: "Water T ", "&" and " D-46816". The selectSingleNode() method returns the first node: "Water T ". So this is correct. expression "a/b" selects all <b> elements. If you apply the string function to it, you will retrieve the string-value of the <b> element. This expression should do the trick: "string(a/b[1])", as illustrated by the example below: String xml = "<a><b>Water T & D-46816</b></a>"; Document doc = DocumentHelper.parseText(xml); String result = (String) doc.selectObject("string(a/b[1])"); now, result is equal to "Water T & D-46816" Another way is to retrieve the node and ask for the string-value directly on the node: Node node = doc.selectSingleNode("a/b"); String result = node.getStringValue(); I hope this helped you out. If you still feel this is a bug, please tell me otherwise I'll close this issue. regards, Maarten ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-02-10 18:49 Message: Logged In: NO This problem affects other xpath query types sch as /a/b/* etc... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1116471&group_id=16035 ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ dom4j-dev mailing list dom4j-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dom4j-dev