In message "Re: [cp-patches] gnu/xml/transform/StreamSerializer.java: compatibilityMode setting" on 05/02/13, Chris Burdess <[EMAIL PROTECTED]> writes:
:> Unfortunately your patch is almost guaranteed to produce :> non-well-formed XML. OK, I do not insist on my patch, and I do not use the patched program myself now: I use UTF-8 and "iconv -f UTF-8 -t EUC-JP". From a practical viewpoint of mine, whether the produced XML is valid is less important than whether it is compact and human-readable. When I am handling a Japanese text, I can assume that only Japanese and ASCII characters appear in it. I understand that a commonly used system like GNU Classpath cannot take this practical viewpoint and must take the safest choice. :> I agree that compatibilityMode is a hack. What's really needed is a way :> to detect whether a character is a valid member of a given encoding, As for CJK characters, I cannot imagine such a way of testing a character without having a table of all valid characters. I used to use Saxon as an XSLT processor, and this is what Saxon does: Saxon itself does not support character encodings other than those standard ones as UTF-8 or ISO-8859-1, and relies on java.nio.charsets package to handle general character encodings. In addition to that, Saxon provides a API with which a user can write his own character set handler which tells whether a character is a valid member of a given encoding. In order to satisfy my needs, I wrote my own Japanese character handler which tells a lie that all Unicode characters are Japanese characters, just like I set the compatibilityMode for gnu/xml/transform/StreamSerializer to true. I think this is a good idea. Saxon can be free from the risk of producing invalid XML documents and responsible users can do anything they like. _______________________________________________ Classpath-patches mailing list Classpath-patches@gnu.org http://lists.gnu.org/mailman/listinfo/classpath-patches