Re: Strings in a programming language

Marcin 'Qrczak' Kowalczyk Thu, 03 Jul 2003 12:06:16 -0700

Dnia czw 3. lipca 2003 20:13, Beni Cherniavsky napisał:

> > What kind of string processing UTF-8 makes simpler than UTF-32?
>
> Any processing that works now on ASCII and doesn't break UTF-8
> sequences in the middle.


It's not simpler, it's the same. Since the language is not C, it doesn't 
matter that C provides some functions working on text - they won't be used 
anyway because they break on '\0' characters (and ANSI C doesn't provide 
anything interesting).

> UTF-8 is better for interfacing to
> unicode-unaware libraries that take things like file paths and will
> split them on ``/`` and will certainly not expect null bytes in the
> middle.

Filenames must be converted anyway, and there aren't many C libraries which 
require UTF-8. Gtk+2 is one of such libraries, but most libraries expect the 
encoding of the current locale, not necessarily UTF-8. Pure string processing
which doesn't care what encoding is used must be written from stratch anyway 
(at least because of memory management issues and '\0's).

> UTF-8 will "just work" for most ``char *`` C APIs, that's the
> whole point of it.

Yes, but in a language which is not C you don't have legacy APIs.

If I used UTF-8, interfacing to C is easier in some cases but the language 
itself is uglier and harder to use. Having a self-consistent language is more 
important for me than being mistake-compatible with C.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Strings in a programming language

Reply via email to