Marcin 'Qrczak' Kowalczyk wrote on 2003-07-03:

> Dnia czw 3. lipca 2003 20:13, Beni Cherniavsky napisał:
>
> > > What kind of string processing UTF-8 makes simpler than UTF-32?
> >
> > Any processing that works now on ASCII and doesn't break UTF-8
> > sequences in the middle.
>
> It's not simpler, it's the same. Since the language is not C, it doesn't
> matter that C provides some functions working on text - they won't be used
> anyway because they break on '\0' characters (and ANSI C doesn't provide
> anything interesting).
>
But there are C libraries out there that do provide interesting
features.  I was referring to interfaces to such libraries.  If you
must allow '\0' as part of a string, then indeed all existing
libraries are out (unless you do somthing ugly like encoding '\0' with
a non-minimal UTF-8 sequence).

> > UTF-8 is better for interfacing to unicode-unaware libraries that
> > take things like file paths and will split them on ``/`` and will
> > certainly not expect null bytes in the middle.
>
> Filenames must be converted anyway, and there aren't many C
> libraries which require UTF-8. Gtk+2 is one of such libraries, but
> most libraries expect the encoding of the current locale, not
> necessarily UTF-8. Pure string processing which doesn't care what
> encoding is used must be written from stratch anyway (at least
> because of memory management issues and '\0's).
>
> > UTF-8 will "just work" for most ``char *`` C APIs, that's the
> > whole point of it.
>
> Yes, but in a language which is not C you don't have legacy APIs.
>
> If I used UTF-8, interfacing to C is easier in some cases but the
> language itself is uglier and harder to use. Having a
> self-consistent language is more important for me than being
> mistake-compatible with C.
>
OK, if your reliance on C APIs is scarce, I don't see much reasons to
prefer UTF-8.

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>

"Reading the documentation I felt like a kid in a toy shop."
 -- Phil Thompson on Python's standard library
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to