It's been a little over two months since I had a release of my CyberNeko Tools for XNI. I've taken that time to add a lot of requested features and fix some bugs in the HTML parser. The changes include:
* CDATA scanning * Namespace processing * Settings to add or override namespace bindings * Settings to add or override doctype declaration * Filter to "purify" input to produce well-formed XML * Newline scanning bug fix * Infinite loop bug fix
In order to match browser behavior more closely, CDATA sections are communicated as comments, by default. If you set the CDATA section feature to true, however, then CDATA start/end boundaries are reported and the content is sent through the XNI pipeline as character content.
The namespace processing is provided as a way to handle documents with namespaces or add namespaces to documents that don't have them. This feature is not a replacement for a compliant XML parser when parsing XHTML documents, though.
Two pipeline component were added to the filters package: one to bind namespaces and another to "purify" the input in order to ensure that the output is well-formed XML. The NamespaceBinder component is automatically added to the parsing pipeline by the parser when the SAX namespaces feature is set to true so you do not need to instantiate it directly.
The Purifier component does its best to fix problems that are not handled by the tag balancer. For example, the purifier will replace invalid XML characters appearing in the document by encoding them in an XML-safe way. Also, the purifier ensures that comments never contain "--", CDATA sections never contain "]]", and even makes sure that the root element name in the doctype declaration callback matches the document root element name.
As always, you can find the latest downloads at my Apache website:
http://www.apache.org/~andyc/neko/doc/index.html
Enjoy!
-- Andy Clark * [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]