From: "Terry Steichen" <[EMAIL PROTECTED]> > Bob, > > I did end up doing something like your suggestion. It seems pretty hokey > (not your suggestion - what I did) to me, but -- it does work. > > What I did was extract the body text and then did two substring operations > to remove the '<![CDATA[' and ']]'. Then I enclosed the resultant string > inside a root tag ('<doc>mystring</doc>') and parsed that. > > As I said, I'm not really happy with this hack, but it lets me move ahead. > Any suggestions on a more elegant solution would be much appreciated.
I'm still not quite sure what you really want to do. You should not need to parse text or do substring operations. The dom4j Element will contain a tree of Node implementations which in your case will probably be a mixture of Element, Text and CDATA nodes. So you should just be able to use regular Java 2 Collections code and use 'instanceof' to determine which nodes you want to process and which you don't. When you say... > What I want to do is parse the contents of "body" into the component paragraph, highlighted text and regular text parts you could just iterate over the contents of <body> and process things however you wish... Element body = (Element) doc.selectSingleNode( "/doc/body" ); List content = body.content(); for (Iterator iter = content.iterator(); iter.hasNext(); ) { Node child = (Node) iter.next(); if ( child instanceof Element ) { ... process this element, could be a <b> or <p> etc. } else if ( child instanceof Text ) { .. its a block of text... } } James _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Oh, it's good to be a geek. http://thinkgeek.com/sf _______________________________________________ dom4j-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-user