Marcin 'Qrczak' Kowalczyk wrote on 2003-07-03: > Dnia czw 3. lipca 2003 20:13, Beni Cherniavsky napisał: > > > > What kind of string processing UTF-8 makes simpler than UTF-32? > > > > Any processing that works now on ASCII and doesn't break UTF-8 > > sequences in the middle. > > It's not simpler, it's the same. Since the language is not C, it doesn't > matter that C provides some functions working on text - they won't be used > anyway because they break on '\0' characters (and ANSI C doesn't provide > anything interesting). > But there are C libraries out there that do provide interesting features. I was referring to interfaces to such libraries. If you must allow '\0' as part of a string, then indeed all existing libraries are out (unless you do somthing ugly like encoding '\0' with a non-minimal UTF-8 sequence).
> > UTF-8 is better for interfacing to unicode-unaware libraries that > > take things like file paths and will split them on ``/`` and will > > certainly not expect null bytes in the middle. > > Filenames must be converted anyway, and there aren't many C > libraries which require UTF-8. Gtk+2 is one of such libraries, but > most libraries expect the encoding of the current locale, not > necessarily UTF-8. Pure string processing which doesn't care what > encoding is used must be written from stratch anyway (at least > because of memory management issues and '\0's). > > > UTF-8 will "just work" for most ``char *`` C APIs, that's the > > whole point of it. > > Yes, but in a language which is not C you don't have legacy APIs. > > If I used UTF-8, interfacing to C is easier in some cases but the > language itself is uglier and harder to use. Having a > self-consistent language is more important for me than being > mistake-compatible with C. > OK, if your reliance on C APIs is scarce, I don't see much reasons to prefer UTF-8. -- Beni Cherniavsky <[EMAIL PROTECTED]> "Reading the documentation I felt like a kid in a toy shop." -- Phil Thompson on Python's standard library -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
