On 11/2/06, Daniel John Debrunner <[EMAIL PROTECTED]> wrote:
Hmmm, the documentation for native2ascii does not agree with the statement about that characters in the range 128-255 range are converted into Unicode Escapes. It says non-Latin 1 characters are converted, where Latin-1 is the common name for ISO8859-1.
Then the native2ascii documentation doesn't agree with what native2ascii actually does. :-) A quick scan through Derby's translated message files, converted by me from various encodings using native2ascii, shows that all the characters above 128 have been converted to Unicode Escapes. Grep for \\u00[bcdef] in the directories with translated properties files to see examples. Also, I do have vague years-old memories of doing testing of translated properties files and discovering that characters in the upper half of the ISO-8859-1 character set, while read properly from the properties file, were not displayed properly when output to the console. These sorts of problems might be fixed now, might not, probably depends on your JVM. Since I've only ever tested with ASCII properties files since then, I wouldn't know for sure. :-) Anyway, I think what we really want to catch are files that haven't been run through native2ascii and are in some encoding that definitely won't work, like UTF-8 or SJIS. Bytes in the file with a value > 127 are one sign that that might be the case. There's probably a better way to figure out if you have a file not in a non-ASCII or ISO8859 encoding, but it may be more complicated than what we need. I'll go do some searching around that. andrew
