Dnia czw 3. lipca 2003 20:13, Beni Cherniavsky napisał:
> > What kind of string processing UTF-8 makes simpler than UTF-32?
>
> Any processing that works now on ASCII and doesn't break UTF-8
> sequences in the middle.
It's not simpler, it's the same. Since the language is not C, it doesn't
matter that C provides some functions working on text - they won't be used
anyway because they break on '\0' characters (and ANSI C doesn't provide
anything interesting).
> UTF-8 is better for interfacing to
> unicode-unaware libraries that take things like file paths and will
> split them on ``/`` and will certainly not expect null bytes in the
> middle.
Filenames must be converted anyway, and there aren't many C libraries which
require UTF-8. Gtk+2 is one of such libraries, but most libraries expect the
encoding of the current locale, not necessarily UTF-8. Pure string processing
which doesn't care what encoding is used must be written from stratch anyway
(at least because of memory management issues and '\0's).
> UTF-8 will "just work" for most ``char *`` C APIs, that's the
> whole point of it.
Yes, but in a language which is not C you don't have legacy APIs.
If I used UTF-8, interfacing to C is easier in some cases but the language
itself is uglier and harder to use. Having a self-consistent language is more
important for me than being mistake-compatible with C.
--
__("< Marcin Kowalczyk
\__/ [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/