From: "Terry Steichen" <[EMAIL PROTECTED]>
> Bob,
>
> I did end up doing something like your suggestion.  It seems pretty hokey
> (not your suggestion - what I did) to me, but -- it does work.
>
> What I did was extract the body text and then did two substring operations
> to remove the '<![CDATA[' and ']]'.  Then I enclosed the resultant string
> inside a root tag ('<doc>mystring</doc>') and parsed that.
>
> As I said, I'm not really happy with this hack, but it lets me move ahead.
> Any suggestions on a more elegant solution would be much appreciated.

I'm still not quite sure what you really want to do. You should not need to
parse text or do substring operations. The dom4j Element will contain a tree
of Node implementations which in your case will probably be a mixture of
Element, Text and CDATA nodes. So you should just be able to use regular
Java 2 Collections code and use 'instanceof' to determine which nodes you
want to process and which you don't.

When you say...

> What I want to do is parse the contents of "body" into the component
paragraph, highlighted text and regular text parts

you could just iterate over the contents of <body> and process things
however you wish...

Element body = (Element) doc.selectSingleNode( "/doc/body" );
List content = body.content();
for (Iterator iter = content.iterator(); iter.hasNext(); ) {
    Node child = (Node) iter.next();
    if ( child instanceof Element ) {
        ... process this element, could be a <b> or <p> etc.
    }
    else if ( child instanceof Text ) {
        .. its a block of text...
    }
}

James


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to