Re: [Muscle] problem installing pycsc

Karsten Ohme Wed, 26 Apr 2006 18:40:41 -0700

Ludovic Rousseau wrote:
> On 26/04/06, Peter Tomlinson <[EMAIL PROTECTED]> wrote:
> 
>>Ludovic Rousseau wrote:
>>
>>>According to [1] you may code some unicode characters on
>>>4 bytes.
>>>[1] http://en.wikipedia.org/wiki/UTF-16
>>
>>You should consult ISO 10646 [1].
>>
>>The advice that I was given when having to incorporate multiple
>>character sets into eURI [2] was that it is satisfactory to restrict an
>>implementation to UTF-16, as that covers all commercially and government
>>used written scripts. But designers should make a statement that UTF-16
>>is used in their work (I'm not sure that I made that clear in eURI...).
> 
> 
> I think I know why Microsoft or Java uses UCS-2. Unicode 1.0 was only
> 16 bits [1].
> 
> But I don't see why UTF-16 is better than UTF-8 if the choice is made
> _now_. Maybe because functions to manipulate UTF-8 are not available
> in Windows and Java?


For Windows see MultiByteToWideChar() and WideCharToMultiByte(), Java
has UTF-8 support, it must be specified as the encoding and can be
handled. For Windows I think the reason is the fixed size of two bytes
for each character, string manipulation routines are faster.

For Java: http://java.sun.com/j2se/corejava/intl/reference/faqs/index.html

Karsten
> 
> Bye,
> 
> [1] http://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-surrogate
> 
> --
>   Dr. Ludovic Rousseau
> 
> _______________________________________________
> Muscle mailing list
> [email protected]
> http://lists.drizzle.com/mailman/listinfo/muscle

_______________________________________________
Muscle mailing list
[email protected]
http://lists.drizzle.com/mailman/listinfo/muscle

Re: [Muscle] problem installing pycsc

Reply via email to