Re: [bitc-dev] Unicode and bitc

Jonathan S. Shapiro Wed, 13 Oct 2010 15:21:39 -0700

On Wed, Oct 13, 2010 at 1:46 AM, Ben Kloosterman <[email protected]> wrote:


> >For typical in-memory string manipulation, UCS-2 has served us well,
>
> I think this is just because UCS2 was the standard at the time and it was
> intended that documents use it.


There was a brief phase during Unicode-1.0 when a 16-bit external character
representation (not UCS2) was advocated. This died instantly because it
wasn't compatible with the overwhelming body of existing ASCII data in the
field.


> Most Asian chars can't be represented in UCS2 making it probably worse than
> the old Ascii encodings still in common use in Asia.
>

Actually, I'm not sure that's correct. What can't be represented in UCS2 is
the legacy encoding (shift-JIS and the other one whose name I don't
remember). The major Asian languages do have representations in the  16-bit
encoding space.

That said, there is a *huge* volume of data in those two older
representations, which means that for reasons of *data* compatibility we
need the full code point space.


shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Unicode and bitc

Reply via email to