I'm happy to announce that mod_xml2enc is now ready for use. mod_xml2enc is designed to be used with libxml2-based filter modules, such as: mod_accessibility mod_proxy_html mod_publisher mod_transform mod_xml2 mod_xslt and serves to improve their internationalisation support:
(1) It sniffs the encoding of incoming documents, using HTTP headers where available, or XML or HTML rules where there is no HTTP information. (2) If a character set is not supported by libxml2, it converts to UTF-8 ahead of the markup filter. (3) It removes any encoding information that is invalidated by the processing, and substitutes a correct HTTP header. To take advantage of this, filter modules should use the xml2enc_charset optional function to retrieve the charset argument to pass to the libxml2 parser. Note that you may have to handle APR_EAGAIN, if your module sets up the parser before mod_xml2enc has been able to sniff the first data. I'll be updating published versions of my filter modules to use it as round tuits permit. Filter modules can also postprocess to output a different charset again, using the xml2enc_filter optional function. Additional capabilities are preprocessing of bad HTML (a function introduced in mod_proxy_html 3, but also relevant to other HTML modules), and an additional optional hook for preprocessing. These extra functions are untested. Developers, feel free to explore and send feedback! http://apache.webthing.com/mod_xml2enc/ -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/