Hi Bruno,

On Fri, 13 Jul 2001, Bruno Haible wrote:
> Christoph Rohland writes:
> 
>> So if you assume that the source file is in UTF-8 normal string
>> literals should be UTF-8.
> 
> Yes, but only if the compiler is gcc, and no "coding:" marker is at
> the top of the file, and no overruling command line option has been
> given.

Yes, but perhaps we could try to make that standard?

>> And this case would be handled without special casing, right?
> 
> The internal processing for UTF-8 string literals in this case would
> be trivial. But the internal processing for UTF-16 or UCS-4 string
> literals is not complicated either.
> 
> The important point is that there be an agreement across several
> compiler vendors what u8"...", u16"..." and u32"..." mean and how
> the types are called.

>From my point of view this sounds reasonable.

> (Can't we use uint_least16_t instead of utf16_t?)

No, I think one of the biggest mistakes in the C standard is that
char/wchar_t is not fixed. We need an exact 16 bit type with a defined
encoding. It is complicated enough to handle >8bit in a heterogenous
networked environment, but variable length of the entities is not
acceptable (at least for us).

>> For UTF-8 see above, for UCS-4 I thought the wchar_t is the right
>> representaion.
> 
> Currently only on glibc systems. wchar_t == UCS-4 is only a
> recommendation in ISO C 99, not mandatory (unfortunately).

No, it will be on all Unix systems we support: Solaris, True64,
HPUX, AIX5L, Reliant. But you are probably right: If want to think to
the end you will introduce an utf32_t (or is it ucs4_t?) also.

Greetings
                Christoph


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to