another alternative might be to use nekoHTML
(http://www.apache.org/~andyc/) to bootstrap a SAC pipeline.
- robert
On 23 May 2004, at 23:44, Adrian Sutton wrote:
Sounds like you may want to run the HTML section through JTidy
(http://jtidy.sourceforge.net) to convert it to XHTML first. Then
Digester should be able to at least parse it.
Regards,
Adrian Sutton.
-----Original Message-----
From: Simon Kitching [mailto:[EMAIL PROTECTED]
Sent: Monday, 24 May 2004 8:39 AM
To: Jakarta Commons Users List
Subject: Re: [digester] reading embedded HTML (or other mixed text)
On Fri, 2004-05-21 at 12:34, Bill Keese wrote:
Is there any way to tell digester to read in the entire content of an
element (including text and sub-elements) as a single String? For
example, if I persist e-mail to XML, I'd like to use digester to read
the e-mail address list, etc., but the HTML content of the mail should
be read verbatim.
Hi Bill,
HTML is not valid XML. Digester uses a standard XML parser to parse the
input, so it is not possible to process an input document which is not
valid XML.
As Jose has said in a separate reply, you could wrap your HTML in CDATA
tags in the input document. The xml parser will then see the contents
of
that cdata section as just a text string - and so will Digester.
Alternatively, you could use XHTML, which most browsers support. In
this
case, you could then use NodeCreateRule.
Regards,
Simon
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]