[dom4j-dev] [ dom4j-Bugs-1116471 ] Problem with XPath and retrieving text

SourceForge.net Fri, 16 Dec 2005 05:26:24 -0800

Bugs item #1116471, was opened at 2005-02-04 22:06
Message generated for change (Comment added) made by mpichler
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1116471&group_id=16035


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Steve Carter (cart33)
Assigned to: Maarten Coene (maartenc)
Summary: Problem with XPath and retrieving text

Initial Comment:
I have a Junit test similar to the following:

public void test() {

      fiinal String XML = "<a><b>Water T &amp;
D-46816</b></a>";
      final String XPATH = "a/b/text()";
      final String EXPECTED_VALUE = "Water T & D-46816";

      XPath xpathObj = createXpathObject(XPATH );
      Document doc = createDocument(XML );
      Object node = xpathObj.selectSingleNode(doc);

        if (node instanceof Text) {
            result = ((Text) node).getText();
        }
       
        assertEquals(EXPECTED_VALUE, result));
}

which fails because getText() only returns: Water T

interrogating the node object returned from
selectSingleNode indicates that the expected result is
present as 3 seperate text elements in the content
(ArrayList) member variable

I can retrieve the value if I tweak the approach to use:
 
    final String XPATH = "a/b";

     if (node instanceof Element) {
            return  (String) ((Element) node).getData();
        }

If i dont have entity references then the first
approach always works. Therefore this seems to be a
bug, please correct me if i am wrong.


----------------------------------------------------------------------

Comment By: Michael Pichler (mpichler)
Date: 2005-12-16 14:25

Message:
Logged In: YES 
user_id=613551

Hi,

I think this is perfectly normal. There are multiple
text() children which may be addressed separately with
xpaths containing indices (see bug 1374352).

Your problem is that selectSingleNode() only selects the
first matching text child, and it seems you should call
normalize() on the root element first to "merge" adjacent
Text nodes before any further processings.

regards,
Michael Pichler


----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-04-14 04:14

Message:
Logged In: NO 

I have just hit this bug (in production of course ;-).
Originally using dom4j 1.4, but still present using dom4j
1.5.2, jaxen 1.0FCS.

My xpath is of the form "//a/b[text()="value"]/..".
This failed in one case because the 'b' element has been
parsed into two 'text' nodes. It seems it crossed some
buffer boundary in the parsing stage, as the two text values
are "TINBICS_SECOND" and "ARY_FEC" (i.e. just normal text).
In our other test cases this has been parsed as a single
text node. I verified the arbitrary splitting by adding
spaces earlier in the file, and the position of the split
moved accordingly "TINBICS_SEC" and "ONDARY_FEC".

Replacing the xpath with "//a[b="value"]" solved the problem,
so this seems to be a problem with using "text()" in the xpath.

The xpath spec says there should never be two adjacent text
nodes.
http://www.w3.org/TR/xpath#section-Text-Nodes

Second, the xpath spec says that 'text()' should select all
text nodes.
http://www.w3.org/TR/xpath#path-abbrev

I'm not sure if dom4j is "at fault", but it sure would be
nice if it could at least be resilient to the problem.

:-)

Andrew.


----------------------------------------------------------------------

Comment By: Steve Carter (cart33)
Date: 2005-02-13 05:06

Message:
Logged In: YES 
user_id=597933

Thanks for the explanation. Greatly appreciated. I have not
made myself familiar with the specification so I appreciate
your insight. It  just seemed intuitive to me that
selectSingleNode() would return the full value of the node
whether references were present or not. Feel free to close
this issue and pursue it as an enhancement as there are many
approaches to satisfy the solution. I enjoy using your api
and thanks again for the help. 

----------------------------------------------------------------------

Comment By: Maarten Coene (maartenc)
Date: 2005-02-12 16:04

Message:
Logged In: YES 
user_id=178745

On the other hand, I see now in the 5.7 of the XPath spec
that a text node shouldn't have immediately following
siblings that are text nodes themselfs, so this could be a
bug indeed.

I'll investigate this further...

regards,
Maarten

----------------------------------------------------------------------

Comment By: Maarten Coene (maartenc)
Date: 2005-02-12 15:57

Message:
Logged In: YES 
user_id=178745

I don't think this is a bug.

The following happened:
expression "a/b/text()" selects all text nodes of <b>.
Because you have an entity reference in it, the SAX parser
you have used did create 3 text nodes: "Water T ", "&" and "
D-46816". The selectSingleNode() method returns the first
node: "Water T ". So this is correct.

expression "a/b" selects all <b> elements. If you apply the
string function to it, you will retrieve the string-value of
the <b> element. This expression should do the trick:
"string(a/b[1])", as illustrated by the example below:

String xml = "<a><b>Water T &amp; D-46816</b></a>";
Document doc = DocumentHelper.parseText(xml);
String result = (String) doc.selectObject("string(a/b[1])");

now, result is equal to "Water T & D-46816"

Another way is to retrieve the node and ask for the
string-value directly on the node:

Node node = doc.selectSingleNode("a/b");
String result = node.getStringValue();

I hope this helped you out. If you still feel this is a bug,
please tell me otherwise I'll close this issue.

regards,
Maarten

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2005-02-10 18:49

Message:
Logged In: NO 

This problem affects other xpath query types sch as /a/b/*
etc...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1116471&group_id=16035


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
dom4j-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

[dom4j-dev] [ dom4j-Bugs-1116471 ] Problem with XPath and retrieving text

Reply via email to