Re: [dom4j-user] Extracting Text Fragments

James Strachan Tue, 24 Sep 2002 06:32:43 -0700

Approach 1 is the best - then it'll make a List of all the Text nodes and you can just index into the List if you like

List fragments = root.selectNodes("text()")

Node node1 = (Node) fragments.get(0);

Node node2 = (Node) fragments.get(1);

String text1 = node1.getText();

String text2 = node2.getText();

Probably the fastest way though is just to iterate through the content looking for Text nodes...

for (Iterator iter = root.content().iterator(); iter.hasNext(); ) {

Node node = (Node) iter.next();

if (node instanceof Text) {

String text = node.getText();

...

}

James
-------
http://radio.weblogs.com/0112098/

----- Original Message -----

From: Terry Steichen

To: James Strachan

Sent: Tuesday, September 24, 2002 2:49 PM

Subject: Re: [dom4j-user] Extracting Text Fragments

James,

Yes, that works fine too. Thank you.

Question: Assuming that I needed to pick out several text segments (separated by other tags), it would seem that there are at least two ways to go about this.

Approach 1

fragments = root.selectNodes("text()")

tag1 = fragments.get(0).getText();

tag2 = fragments.get(1).getText();

..

Approach 2

tag1 = root.selectSingleNode("text()[0]").getText();

tag2 = root.selectSingleNode("text()[1]").getText();

..

Which one would be more efficient? (I presume - but don't know - that Approach 1 would be.)

Regards,

Terry

----- Original Message -----

From: James Strachan

To: Terry Steichen ; dom4j-user

Sent: Tuesday, September 24, 2002 2:59 AM

Subject: Re: [dom4j-user] Extracting Text Fragments

I think in XPath something like the following would work...

piece/text()[1]

or as Steve just said in a seperate mail, you could just iterate through the Text nodes, using the first one you find.

James
-------
http://radio.weblogs.com/0112098/

----- Original Message -----

From: Terry Steichen

To: dom4j-user

Sent: Monday, September 23, 2002 9:18 PM

Subject: [dom4j-user] Extracting Text Fragments

Assume I have an XML document including the following fragment:

<piece>text1<blob1>stuff</blob1>text2</piece>

How do you extract text1 separately from text2 - that is, programmatically distinguish where text1 ends and text2 begins?

Regards,

Terry

Re: [dom4j-user] Extracting Text Fragments

Reply via email to