On Nov 27, 2004, at 4:48 PM, Ceki G�lc� wrote:
Regarding non-ASCII characters for file names, logger names etc, Joran just takes in the data supplied by the parser. It does not attempt to perform any character conversions. Anyway, isn't correctly decoding the character data the responsibility of the XML parser?




Java is spoiling you, they declare that char and String are always UTF-16. In C++ you have to be aware and consistent of the expected encoding, especially when using char and std::string, and previous experience with the log4cxx code base does not give me confidence that the existing DOMConfigurator would have gotten everything right. Even if it did, there is a non-zero chance that I'd break things when moving to APR's XML support.

For example, xmlChar in libxml2 is a UTF-8 byte. If a string returned from libxml2 is passed directly (which would not cause any compiler errors or warnings) to an operating system call (say fopen) and the default operating system encoding is not UTF-8, then the file name may be corrupted. Since 0x00-0x7F specify US-ASCII characters in UTF-8 and most single byte encodings, the problem would not be apparent until a filename with a non-US-ASCII character was specified in the XML configuration file.

Since I think it is likely that current DOMConfiguration has these types of problems or if it doesn't, I may introduce them, it seems expedient to port JoranConfigurator and do it right instead of trying to modify DOMConfigurator.



Reply via email to