Re: Fw: [dom4j-user] Parsing CDATA

James Strachan Tue, 09 Jul 2002 23:01:19 -0700

Have you got an example XML document and a little bit of code that you're
running? Believe me, you can use the content() List of a Branch (Document or
Element) to walk the entire XML tree. It must just be that you're looking at
the wrong element? Or that XML elements are actually embedded inside a CDATA
section? An example XML document would help here...


James
----- Original Message -----
From: "Terry Steichen" <[EMAIL PROTECTED]>
To: "James Strachan" <[EMAIL PROTECTED]>
Cc: "dom4j-user" <[EMAIL PROTECTED]>
Sent: Tuesday, July 09, 2002 6:25 PM
Subject: Re: Fw: [dom4j-user] Parsing CDATA


> James,
>
> Thanks for the comments.  Unfortunately, however, your suggestion doesn't
> seem to work.  It seems that the List that's returned ('content' in your
> suggested code below) only contains one item: the CDATA section.  So it
> still doesn't let me decompose that into the component <p> and <b>
elements
> that exist within the CDATA.  Maybe I'm doing something wrong, but that
> seems to be the behavior.
>
> Regards,
>
> Terry
>
> ----- Original Message -----
> From: "James Strachan" <[EMAIL PROTECTED]>
> To: "Terry Steichen" <[EMAIL PROTECTED]>
> Cc: "dom4j-user" <[EMAIL PROTECTED]>
> Sent: Monday, July 08, 2002 5:32 AM
> Subject: Re: Fw: [dom4j-user] Parsing CDATA
>
>
> > From: "Terry Steichen" <[EMAIL PROTECTED]>
> > > Bob,
> > >
> > > I did end up doing something like your suggestion.  It seems pretty
> hokey
> > > (not your suggestion - what I did) to me, but -- it does work.
> > >
> > > What I did was extract the body text and then did two substring
> operations
> > > to remove the '<![CDATA[' and ']]'.  Then I enclosed the resultant
> string
> > > inside a root tag ('<doc>mystring</doc>') and parsed that.
> > >
> > > As I said, I'm not really happy with this hack, but it lets me move
> ahead.
> > > Any suggestions on a more elegant solution would be much appreciated.
> >
> > I'm still not quite sure what you really want to do. You should not need
> to
> > parse text or do substring operations. The dom4j Element will contain a
> tree
> > of Node implementations which in your case will probably be a mixture of
> > Element, Text and CDATA nodes. So you should just be able to use regular
> > Java 2 Collections code and use 'instanceof' to determine which nodes
you
> > want to process and which you don't.
> >
> > When you say...
> >
> > > What I want to do is parse the contents of "body" into the component
> > paragraph, highlighted text and regular text parts
> >
> > you could just iterate over the contents of <body> and process things
> > however you wish...
> >
> > Element body = (Element) doc.selectSingleNode( "/doc/body" );
> > List content = body.content();
> > for (Iterator iter = content.iterator(); iter.hasNext(); ) {
> >     Node child = (Node) iter.next();
> >     if ( child instanceof Element ) {
> >         ... process this element, could be a <b> or <p> etc.
> >     }
> >     else if ( child instanceof Text ) {
> >         .. its a block of text...
> >     }
> > }
> >
> > James
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
>


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Two, two, TWO treats in one.
http://thinkgeek.com/sf
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Re: Fw: [dom4j-user] Parsing CDATA

Reply via email to