Sounds like you may want to run the HTML section through JTidy
(http://jtidy.sourceforge.net) to convert it to XHTML first.  Then
Digester should be able to at least parse it.

Regards,

Adrian Sutton. 

-----Original Message-----
From: Simon Kitching [mailto:[EMAIL PROTECTED] 
Sent: Monday, 24 May 2004 8:39 AM
To: Jakarta Commons Users List
Subject: Re: [digester] reading embedded HTML (or other mixed text)

On Fri, 2004-05-21 at 12:34, Bill Keese wrote:
> Is there any way to tell digester to read in the entire content of an
> element (including text and sub-elements) as a single String? For
> example, if I persist e-mail to XML, I'd like to use digester to read
> the e-mail address list, etc., but the HTML content of the mail should
> be read verbatim.
> 

Hi Bill,

HTML is not valid XML. Digester uses a standard XML parser to parse the
input, so it is not possible to process an input document which is not
valid XML.

As Jose has said in a separate reply, you could wrap your HTML in CDATA
tags in the input document. The xml parser will then see the contents of
that cdata section as just a text string - and so will Digester.

Alternatively, you could use XHTML, which most browsers support. In this
case, you could then use NodeCreateRule.

Regards,

Simon 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to