On Wed, Oct 13, 2010 at 1:46 AM, Ben Kloosterman <[email protected]> wrote:
> >For typical in-memory string manipulation, UCS-2 has served us well, > > I think this is just because UCS2 was the standard at the time and it was > intended that documents use it. There was a brief phase during Unicode-1.0 when a 16-bit external character representation (not UCS2) was advocated. This died instantly because it wasn't compatible with the overwhelming body of existing ASCII data in the field. > Most Asian chars can't be represented in UCS2 making it probably worse than > the old Ascii encodings still in common use in Asia. > Actually, I'm not sure that's correct. What can't be represented in UCS2 is the legacy encoding (shift-JIS and the other one whose name I don't remember). The major Asian languages do have representations in the 16-bit encoding space. That said, there is a *huge* volume of data in those two older representations, which means that for reasons of *data* compatibility we need the full code point space. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
