I would certainly agree with that. As 32 bit chars is clearly unacceptable from a storage point of view that leaves you with just UTF-8 ( since UTF-16 is not fixed and uses more memory) . With other forms via 16 bit fixed for legacy algorithms maybe via GetFixedCharArray. Considering as lot of C legacy string code uses mutability and arrays/ptrs doing this via an array is not a bad idea, with maybe a very simple subset in the lib for char[] string work.
UCS-2 and chinese is especially a mess there are over 70,000 chars in unicode and new chars are created every year this is made worse by the compositional model from traditional chars ( basically they used the same Unicode char for some char for Taiwan , Hong Kong and Japan but the meaning is not always 1:1...Oops)... Regards, Ben >-----Original Message----- >From: [email protected] [mailto:[email protected]] >On Behalf Of William Leslie >Sent: Thursday, October 14, 2010 7:02 AM >To: Discussions about the BitC language >Subject: Re: [bitc-dev] Unicode and bitc > >On 14 October 2010 09:57, Ben Kloosterman <[email protected]> wrote: >> I don't think data compatability is an issue , im not sure I >understand this >> , all systems have ways of converting to and native formats . > >The data here is the text - shap is saying that the native string type >should be able to represent any Unicode we can throw at it (so no >UCS-2). > >-- >William Leslie > >_______________________________________________ >bitc-dev mailing list >[email protected] >http://www.coyotos.org/mailman/listinfo/bitc-dev >No virus found in this incoming message. >Checked by AVG - www.avg.com >Version: 9.0.862 / Virus Database: 271.1.1/3183 - Release Date: 10/13/10 >02:37:00 _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
