Hi Bruno,
On Fri, 13 Jul 2001, Bruno Haible wrote:
> Christoph Rohland writes:
>
>> So if you assume that the source file is in UTF-8 normal string
>> literals should be UTF-8.
>
> Yes, but only if the compiler is gcc, and no "coding:" marker is at
> the top of the file, and no overruling command line option has been
> given.
Yes, but perhaps we could try to make that standard?
>> And this case would be handled without special casing, right?
>
> The internal processing for UTF-8 string literals in this case would
> be trivial. But the internal processing for UTF-16 or UCS-4 string
> literals is not complicated either.
>
> The important point is that there be an agreement across several
> compiler vendors what u8"...", u16"..." and u32"..." mean and how
> the types are called.
>From my point of view this sounds reasonable.
> (Can't we use uint_least16_t instead of utf16_t?)
No, I think one of the biggest mistakes in the C standard is that
char/wchar_t is not fixed. We need an exact 16 bit type with a defined
encoding. It is complicated enough to handle >8bit in a heterogenous
networked environment, but variable length of the entities is not
acceptable (at least for us).
>> For UTF-8 see above, for UCS-4 I thought the wchar_t is the right
>> representaion.
>
> Currently only on glibc systems. wchar_t == UCS-4 is only a
> recommendation in ISO C 99, not mandatory (unfortunately).
No, it will be on all Unix systems we support: Solaris, True64,
HPUX, AIX5L, Reliant. But you are probably right: If want to think to
the end you will introduce an utf32_t (or is it ucs4_t?) also.
Greetings
Christoph
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/