Bob,

I did end up doing something like your suggestion.  It seems pretty hokey
(not your suggestion - what I did) to me, but -- it does work.

What I did was extract the body text and then did two substring operations
to remove the '<![CDATA[' and ']]'.  Then I enclosed the resultant string
inside a root tag ('<doc>mystring</doc>') and parsed that.

As I said, I'm not really happy with this hack, but it lets me move ahead.
Any suggestions on a more elegant solution would be much appreciated.

Regards,

Terry

----- Original Message -----
From: "bob mcwhirter" <[EMAIL PROTECTED]>
To: "Terry Steichen" <[EMAIL PROTECTED]>
Cc: "dom4j-user" <[EMAIL PROTECTED]>
Sent: Saturday, July 06, 2002 10:43 AM
Subject: Re: Fw: [dom4j-user] Parsing CDATA


>
> Instead of doing text manipulation, can you use selectNode(...) to
> find the <body> node, then use the dom4j to get the textual content?
>
> You may then have to do some text manip to wrap some outter tags
> around the content, though, as a well-formed XML doc has exactly
> 1 root element.
>
> ie:
>
> String stuff = "<new-root"> + bodyElement.getText() + "</new-root>";
>
> Then, you have some (hopefully) well-formed XML you can parse again.
>
> -bob
>
>
> On Sat, 6 Jul 2002, Terry Steichen wrote:
>
> > Let me expand a bit on my earlier question.
> >
> > I created an XML file (using XMLWriter) with an element called 'body'
> > that contains a CDATA mixture of paragraphs (using the <p> and </p>
tags),
> > bold text(delineated with <b> and </b> tags) and ordinary text. I then
> > read and parsed this XML file into Document doc1.
> >
> > Next, I extract doc1.element("body").asXML() into a String called
> > 'stuff'.
> >
> > What I want to do is parse the contents of "body" into the component
> > paragraph, highlighted text and regular text parts.  So, I created a
> > string something like "<doc>" + stuff + "</doc>", used that to create
> > a StringReader 's_in' and used a SAXReader.read(s_in) to create a new
> > Document doc2.
> >
> > Unfortunately, doc2 now contains an element 'body', instead of a set
> > of 'p' elements. No matter what I do, I still end up with the 'body'
> > element with all of its contents treated as a (CDATA) lump.  So I am
> > unable to selectively extract the 'p', 'b' tags or text.
> >
> > That's where I'm stumped.
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Got root? We do.
> http://thinkgeek.com/sf
> _______________________________________________
> dom4j-user mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/dom4j-user



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Got root? We do.
http://thinkgeek.com/sf
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to