From: "Donald Ball" <[EMAIL PROTECTED]> > On Fri, 2002-04-19 at 18:55, James Strachan wrote: > > From: "Donald Ball" <[EMAIL PROTECTED]> > > > this is a slightly strange request, but bear with me. in our app, we're > > > letting users enter a subset of x/html (no script, embed, etc.). we're > > > parsing using the dom4j SAXReader with validation turned out. it all > > > works very well, thanks for the great tools. however, we'd now like to > > > relax the rules a bit and let users enter a subset of html. > > > > i.e. you want to allow malformed XML? like > > > > <html> > > <body> > > <p> > > hello > > <p> > > <br> > > </body> > > </html> > > exactly > > > There have been some developments lately of parsers that can accept HTML as > > input but behave like XML parsers and balance tags and so forth. So they > > behave just like a regular SAX parser. > > > > A promising example is here:- > > > > http://hotsax.sourceforge.net/ > > seems pretty dead to me, actually. that's a shame.
Agreed. Now I've looked into it NekoHTML is the way to go I think - though JTidy can still be useful. > > Also Andy Clark from the Xerces team has put together a HTML parser called > > NekoHTML which looks really cool (and could well be a great event-based > > replacement for JTidy). > > > > http://www.apache.org/~andyc/ > > > > I think its moving into the Xerces codebase soon. > > this looks more promising, but we haven't tested our app with xerces-2 > yet. i think this will be the long term solution though. It should be usable right now. You can keep whatever parser you're using for XML and just use xerces-2 for the HTML parsing pretty easily. James _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com _______________________________________________ dom4j-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-user
