It is correct ( I live in China at the moment) and is why China has made it
illegal for systems to only represent the basic plane ( UCS2)  while most
chars can be used it's a bit like saying in English you can only use 20000
word dictionary this was one of the main drivers for UTf-16 and UTF-32.

 

 

I don't think data compatability is an issue , im not sure I understand this
, all systems have ways of converting to and native formats .  

 

Ben

 

 

On Wed, Oct 13, 2010 at 1:46 AM, Ben Kloosterman <[email protected]> wrote:

>For typical in-memory string manipulation, UCS-2 has served us well,

 

I think this is just because UCS2 was the standard at the time and it was
intended that documents use it.


There was a brief phase during Unicode-1.0 when a 16-bit external character
representation (not UCS2) was advocated. This died instantly because it
wasn't compatible with the overwhelming body of existing ASCII data in the
field.
 

Most Asian chars can't be represented in UCS2 making it probably worse than
the old Ascii encodings still in common use in Asia.


Actually, I'm not sure that's correct. What can't be represented in UCS2 is
the legacy encoding (shift-JIS and the other one whose name I don't
remember). The major Asian languages do have representations in the  16-bit
encoding space.

That said, there is a huge volume of data in those two older
representations, which means that for reasons of data compatibility we need
the full code point space.


shap

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3183 - Release Date: 10/13/10
02:37:00

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to