On Fri, Sep 21, 2001 at 04:52:51PM -0700, Paul Prescod wrote:
> > Urgh, this is tricky. Once you move outside of the BMP, the encodings you
> > *really* want to work stop working.
>
> Don't follow.
UCS-2 is only defined for characters inside the Basic Multilingual Plane;
UTF-16 has to use surrogates for non-BMP characters, and that sucks too
because what used to be a nice fixed-width encoding has suddenly gone
variable-width on you. You didn't want that to happen. UTF-8 uses surrogates
two, which is screaming difficult to process.
> > Why will they bother screaming loud enough? Unicode doesn't do what they want
> > and JIS/SJIS/EUC/whatever does.
>
> But where do they get their software?
Oh, I forget there are non-Unix platforms. :) I dunno what things like
Ichitaro use for a file format.
> other than Java internally with their recent APIs. So I'd like to know
> more about whether Japanese and Chinese people are really using
> something other than Unicode or whether they are just using variant
> encodings for data that their software treats internally as Unicode.
I have a very strong suspicion it depends on the nationality of the
programmer. :) (And we're supposed to be generating programming languages
for programmers...)
--
Resist the urge to start typing; thinking is a worthwhile alternative.
-- Kernighan and Pike