Bob, I did end up doing something like your suggestion. It seems pretty hokey (not your suggestion - what I did) to me, but -- it does work.
What I did was extract the body text and then did two substring operations to remove the '<![CDATA[' and ']]'. Then I enclosed the resultant string inside a root tag ('<doc>mystring</doc>') and parsed that. As I said, I'm not really happy with this hack, but it lets me move ahead. Any suggestions on a more elegant solution would be much appreciated. Regards, Terry ----- Original Message ----- From: "bob mcwhirter" <[EMAIL PROTECTED]> To: "Terry Steichen" <[EMAIL PROTECTED]> Cc: "dom4j-user" <[EMAIL PROTECTED]> Sent: Saturday, July 06, 2002 10:43 AM Subject: Re: Fw: [dom4j-user] Parsing CDATA > > Instead of doing text manipulation, can you use selectNode(...) to > find the <body> node, then use the dom4j to get the textual content? > > You may then have to do some text manip to wrap some outter tags > around the content, though, as a well-formed XML doc has exactly > 1 root element. > > ie: > > String stuff = "<new-root"> + bodyElement.getText() + "</new-root>"; > > Then, you have some (hopefully) well-formed XML you can parse again. > > -bob > > > On Sat, 6 Jul 2002, Terry Steichen wrote: > > > Let me expand a bit on my earlier question. > > > > I created an XML file (using XMLWriter) with an element called 'body' > > that contains a CDATA mixture of paragraphs (using the <p> and </p> tags), > > bold text(delineated with <b> and </b> tags) and ordinary text. I then > > read and parsed this XML file into Document doc1. > > > > Next, I extract doc1.element("body").asXML() into a String called > > 'stuff'. > > > > What I want to do is parse the contents of "body" into the component > > paragraph, highlighted text and regular text parts. So, I created a > > string something like "<doc>" + stuff + "</doc>", used that to create > > a StringReader 's_in' and used a SAXReader.read(s_in) to create a new > > Document doc2. > > > > Unfortunately, doc2 now contains an element 'body', instead of a set > > of 'p' elements. No matter what I do, I still end up with the 'body' > > element with all of its contents treated as a (CDATA) lump. So I am > > unable to selectively extract the 'p', 'b' tags or text. > > > > That's where I'm stumped. > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Got root? We do. > http://thinkgeek.com/sf > _______________________________________________ > dom4j-user mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/dom4j-user ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Got root? We do. http://thinkgeek.com/sf _______________________________________________ dom4j-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-user