From: "Donald Ball" <[EMAIL PROTECTED]>
> this is a slightly strange request, but bear with me. in our app, we're
> letting users enter a subset of x/html (no script, embed, etc.). we're
> parsing using the dom4j SAXReader with validation turned out. it all
> works very well, thanks for the great tools. however, we'd now like to
> relax the rules a bit and let users enter a subset of html.
i.e. you want to allow malformed XML? like
<html>
<body>
<p>
hello
<p>
<br>
</body>
</html>
> my thought is that we will try the SAXReader and, if we get a
> DocumentException when we're parsing, we'll fall back onto using Tidy to
> generate a DOM document and then use the DOMReader to turn it into a
> dom4j document. question is, how do i validate using our modified html
> dtd? i could spit out the dom4j document as sax and then read it back in
> again, but that seems pretty silly. is there a better way?
There have been some developments lately of parsers that can accept HTML as
input but behave like XML parsers and balance tags and so forth. So they
behave just like a regular SAX parser.
A promising example is here:-
http://hotsax.sourceforge.net/
Its quite early in the development cycle but might be worth a try. Its like
a SAX-based JTidy. Its a shame JTidy is based on DOM and not SAX, it'd be
much more reusable if it were SAX based.
Also Andy Clark from the Xerces team has put together a HTML parser called
NekoHTML which looks really cool (and could well be a great event-based
replacement for JTidy).
http://www.apache.org/~andyc/
I think its moving into the Xerces codebase soon.
The downside of this approach right now is that it uses XNI right now
instead of SAX. It might be possible to use it right now if there's an
XNI -> SAX adapter (which could well be in the Xerces 2.x codebase by now).
Otherwise we'd need a dom4j XNIReader. This should probably be done anyways
so if anyone fancies diving in and writing this it'd be really cool. I've
added it to the to-do list and will get there eventually.
James
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user