Yeah, it's an old thread...

begin  quoting Andrew Lentvorski as of Thu, Jan 11, 2007 at 04:07:52PM -0800:
> Stewart Stremler wrote:
> >begin  quoting Andrew Lentvorski as of Thu, Jan 11, 2007 at 03:53:58PM 
> >>And, for those who don't realize it, UTF-8 is backward compatible with 
> >>ASCII handling.
> >
> >ANSI handling, I thought.
> 
> ASCII.
> 
> The range from 0-127 is precisely the same whether UTF-8 or ASCII.

Erm. I didn't interpret your assertion quite that way.
 
I read it as "routines that handle ASCII can handle UTF8", which
isn't necessarily true; rather than routines that handle UTF8 can
transparently handle ASCII.

> Since UTF-8 ensures that NUL(0) is never part of an encoded sequence, it 
>  generally survives ANSI processing, as well, but it is not backward 
> compatible with it.  It is, in fact, the *only* encoding which has this 
> property.

UTF-7 does, so far as I can tell.

And UTF-7 is *truly* backwards-compatible with ASCII handling routines.
It just has some annoying "features", like using + as the escape
character.

> This is why I harp on UTF-8.  It generally continues to work even when 
> fed to an Ancient Crufty C Program(tm) since it doesn't break the C 
> string handling routines.

I'd rather have UTF8 than the string / wide-string distinction that
comes with UTF-16 and friends.  The *most* annoying thing about Java
is that you have to be careful when reading/writing files to ensure
that you're using the correct size for characters.

-- 
Just ran across a mention of the UTF-7 RFC the other day.
Stewart Stremler


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to