Re: Fw: [dom4j-user] Parsing CDATA

bob mcwhirter Sat, 06 Jul 2002 07:23:34 -0700


Instead of doing text manipulation, can you use selectNode(...) to
find the <body> node, then use the dom4j to get the textual content?


You may then have to do some text manip to wrap some outter tags
around the content, though, as a well-formed XML doc has exactly
1 root element.

ie:

        String stuff = "<new-root"> + bodyElement.getText() + "</new-root>";

Then, you have some (hopefully) well-formed XML you can parse again.

        -bob


On Sat, 6 Jul 2002, Terry Steichen wrote:

> Let me expand a bit on my earlier question.
> 
> I created an XML file (using XMLWriter) with an element called 'body'
> that contains a CDATA mixture of paragraphs (using the <p> and </p> tags),
> bold text(delineated with <b> and </b> tags) and ordinary text. I then
> read and parsed this XML file into Document doc1.
>  
> Next, I extract doc1.element("body").asXML() into a String called
> 'stuff'.
> 
> What I want to do is parse the contents of "body" into the component
> paragraph, highlighted text and regular text parts.  So, I created a
> string something like "<doc>" + stuff + "</doc>", used that to create
> a StringReader 's_in' and used a SAXReader.read(s_in) to create a new
> Document doc2.
> 
> Unfortunately, doc2 now contains an element 'body', instead of a set
> of 'p' elements. No matter what I do, I still end up with the 'body'
> element with all of its contents treated as a (CDATA) lump.  So I am
> unable to selectively extract the 'p', 'b' tags or text.
> 
> That's where I'm stumped.




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Got root? We do.
http://thinkgeek.com/sf
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Re: Fw: [dom4j-user] Parsing CDATA

Reply via email to