It is correct ( I live in China at the moment) and is why China has made it illegal for systems to only represent the basic plane ( UCS2) while most chars can be used it's a bit like saying in English you can only use 20000 word dictionary this was one of the main drivers for UTf-16 and UTF-32.
I don't think data compatability is an issue , im not sure I understand this , all systems have ways of converting to and native formats . Ben On Wed, Oct 13, 2010 at 1:46 AM, Ben Kloosterman <[email protected]> wrote: >For typical in-memory string manipulation, UCS-2 has served us well, I think this is just because UCS2 was the standard at the time and it was intended that documents use it. There was a brief phase during Unicode-1.0 when a 16-bit external character representation (not UCS2) was advocated. This died instantly because it wasn't compatible with the overwhelming body of existing ASCII data in the field. Most Asian chars can't be represented in UCS2 making it probably worse than the old Ascii encodings still in common use in Asia. Actually, I'm not sure that's correct. What can't be represented in UCS2 is the legacy encoding (shift-JIS and the other one whose name I don't remember). The major Asian languages do have representations in the 16-bit encoding space. That said, there is a huge volume of data in those two older representations, which means that for reasons of data compatibility we need the full code point space. shap No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.862 / Virus Database: 271.1.1/3183 - Release Date: 10/13/10 02:37:00
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
