Yeah, it's an old thread... begin quoting Andrew Lentvorski as of Thu, Jan 11, 2007 at 04:07:52PM -0800: > Stewart Stremler wrote: > >begin quoting Andrew Lentvorski as of Thu, Jan 11, 2007 at 03:53:58PM > >>And, for those who don't realize it, UTF-8 is backward compatible with > >>ASCII handling. > > > >ANSI handling, I thought. > > ASCII. > > The range from 0-127 is precisely the same whether UTF-8 or ASCII.
Erm. I didn't interpret your assertion quite that way. I read it as "routines that handle ASCII can handle UTF8", which isn't necessarily true; rather than routines that handle UTF8 can transparently handle ASCII. > Since UTF-8 ensures that NUL(0) is never part of an encoded sequence, it > generally survives ANSI processing, as well, but it is not backward > compatible with it. It is, in fact, the *only* encoding which has this > property. UTF-7 does, so far as I can tell. And UTF-7 is *truly* backwards-compatible with ASCII handling routines. It just has some annoying "features", like using + as the escape character. > This is why I harp on UTF-8. It generally continues to work even when > fed to an Ancient Crufty C Program(tm) since it doesn't break the C > string handling routines. I'd rather have UTF8 than the string / wide-string distinction that comes with UTF-16 and friends. The *most* annoying thing about Java is that you have to be careful when reading/writing files to ensure that you're using the correct size for characters. -- Just ran across a mention of the UTF-7 RFC the other day. Stewart Stremler -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
