From: "Donald Ball" <[EMAIL PROTECTED]>
> On Fri, 2002-04-19 at 18:55, James Strachan wrote:
> > From: "Donald Ball" <[EMAIL PROTECTED]>
> > > this is a slightly strange request, but bear with me. in our app,
we're
> > > letting users enter a subset of x/html (no script, embed, etc.). we're
> > > parsing using the dom4j SAXReader with validation turned out. it all
> > > works very well, thanks for the great tools. however, we'd now like to
> > > relax the rules a bit and let users enter a subset of html.
> >
> > i.e. you want to allow malformed XML? like
> >
> > <html>
> >     <body>
> >         <p>
> >         hello
> >         <p>
> >         <br>
> >     </body>
> > </html>
>
> exactly
>
> > There have been some developments lately of parsers that can accept HTML
as
> > input but behave like XML parsers and balance tags and so forth. So they
> > behave just like a regular SAX parser.
> >
> > A promising example is here:-
> >
> > http://hotsax.sourceforge.net/
>
> seems pretty dead to me, actually. that's a shame.

Agreed. Now I've looked into it NekoHTML is the way to go I think - though
JTidy can still be useful.

> > Also Andy Clark from the Xerces team has put together a HTML parser
called
> > NekoHTML which looks really cool (and could well be a great event-based
> > replacement for JTidy).
> >
> > http://www.apache.org/~andyc/
> >
> > I think its moving into the Xerces codebase soon.
>
> this looks more promising, but we haven't tested our app with xerces-2
> yet. i think this will be the long term solution though.

It should be usable right now. You can keep whatever parser you're using for
XML and just use xerces-2 for the HTML parsing pretty easily.

James


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to