Ludovic Rousseau wrote: > On 26/04/06, Peter Tomlinson <[EMAIL PROTECTED]> wrote: > >>Ludovic Rousseau wrote: >> >>>According to [1] you may code some unicode characters on >>>4 bytes. >>>[1] http://en.wikipedia.org/wiki/UTF-16 >> >>You should consult ISO 10646 [1]. >> >>The advice that I was given when having to incorporate multiple >>character sets into eURI [2] was that it is satisfactory to restrict an >>implementation to UTF-16, as that covers all commercially and government >>used written scripts. But designers should make a statement that UTF-16 >>is used in their work (I'm not sure that I made that clear in eURI...). > > > I think I know why Microsoft or Java uses UCS-2. Unicode 1.0 was only > 16 bits [1]. > > But I don't see why UTF-16 is better than UTF-8 if the choice is made > _now_. Maybe because functions to manipulate UTF-8 are not available > in Windows and Java?
For Windows see MultiByteToWideChar() and WideCharToMultiByte(), Java has UTF-8 support, it must be specified as the encoding and can be handled. For Windows I think the reason is the fixed size of two bytes for each character, string manipulation routines are faster. For Java: http://java.sun.com/j2se/corejava/intl/reference/faqs/index.html Karsten > > Bye, > > [1] http://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-surrogate > > -- > Dr. Ludovic Rousseau > > _______________________________________________ > Muscle mailing list > [email protected] > http://lists.drizzle.com/mailman/listinfo/muscle _______________________________________________ Muscle mailing list [email protected] http://lists.drizzle.com/mailman/listinfo/muscle
