Ludovic Rousseau wrote:
> On 25/04/06, Karsten Ohme <[EMAIL PROTECTED]> wrote:
> 
>>Ludovic Rousseau wrote:
>>
>>>I don't know why MS chose to use UTF-16 instead of UTF-8. UTF-8 is
>>>backward compatible with ASCII so (very) easy to migrate to.
>>
>>For the most languages this would make trouble. E.g. Asian languages use
>>two bytes. So independent of the locale the programmer can allocate two
>>bytes (actually a TCHAR) (if UNICODE is defined). With UTF-8 you must
>>parse the string (get the string length) to get the real physical size
>>of the string, because ASCII is coded on the seven lower bits and the
>>MSB decides about a next byte to get a whole character. I assume this is
>>a reason so that it seams to be simpler.
> 
> 
> You should also need to parse the string to get its real length even
> with UTF-16. According to [1] you may code some unicode characters on
> 4 bytes. So just dividing the array length by 2 to get the string
> length may not work. If you want an simple transformation you should
> use UTF-32 since any unicode character can be represented on exactly
> 32 bits. But I am not a unicode expert.

Mmmh, if you allocate memory for Unicode string always two bytes are
used, so Microsoft uses UCS-2 (see google search for Microsoft UCS-2)
Plane 0: Basic Multilingual Plane (BMP), where this two bytes are fixed
to have a concrete allocation value. Java uses UTF-16.

Karsten
> 
> GTK+ 2.x uses UTF-8 only and proposes a function g_utf8_strlen [2] to
> get the string length.
> 
> 
>>Java also uses UTF-16.
> 
> 
> Maybe not a good example? :-)
> 
> Thanks
> 
> [1] http://en.wikipedia.org/wiki/UTF-16
> [2] 
> http://developer.gnome.org/doc/API/2.0/glib/glib-Unicode-Manipulation.html#g-utf8-strlen
> 
> --
>   Dr Ludovic Rousseau
> 
> _______________________________________________
> Muscle mailing list
> [email protected]
> http://lists.drizzle.com/mailman/listinfo/muscle

_______________________________________________
Muscle mailing list
[email protected]
http://lists.drizzle.com/mailman/listinfo/muscle

Reply via email to