Andrew McIntyre wrote:
A quick scan through Derby's translated message files, converted by me
from various encodings using native2ascii, shows that all the
characters above 128 have been converted to Unicode Escapes. Grep for
\\u00[bcdef] in the directories with translated properties files to
see examples.
Looking more I now see that properties file format is more that just
ISO8859-1 encoding with unicode escapes. The javadoc for
Properties.store states much more about which characters are escaped
including that:
"Characters less than \u0020 and characters greater than \u007E are
written as \uxxxx for the appropriate hexadecimal value xxxx. "
This matches what Andrew sees in the Derby files.
So any checks should be driven off that description only, and
native2ascii and the JLS have no relevance.
So checking for non ASCII byte values in the raw stream is the right
general idea, but the details need to be more specific, e.g. I think any
characters in the range 0x00-0x1f (which are ASCII) and 0x7f-0xff are
invalid, and there may be others.
Thanks,
Dan.