Hi Bruno,

On Thu, 12 Jul 2001, Bruno Haible wrote:
> Christoph Rohland writes:
> 
>> >                     u"UTF-8 string literal"
>> > 
>> > This way no extra 16-bit string functions are needed - the 8-bit
>> > str* functions in libc will do.
>> 
>> Why do you need a special utf8 string literal? UTF8 can be based on
>> standard string literals since in the ACSII range it is the same
>> and the basic entity is 8bit.
> 
> If we design such a feature like u"..." it ought to be usable for
> non-ASCII characters as well (such as the quote characters contained
> in your .doc file). And C 99 doesn't provide for a way to reliably
> produce UTF-8 strings, other than hex or octal escapes:
> "\xe2\x82\xac". Thus it is the same problem as you are having, and
> merits to be solved the same way.
> 
> AFAIK, gcc will by default assume that source files are in UTF-8 if
> no "-*- coding: XXX -*-" signature is present at the top. But that
> doesn't solve the problem when this "coding:" signature is given -
> in that case we wish that the compiler converts the u"..." strings
> from the given encoding to Unicode -, and it doesn't work for other
> compilers than gcc.

So if you assume that the source file is in UTF-8 normal string
literals should be UTF-8. And this case would be handled without
special casing, right?

>> we will do that after the discussion if the _feature_ is welcome.
> 
> I will welcome it if
> 
>   1) There are similar facilities for UTF-8 and UCS-4 encoded
>      strings.

For UTF-8 see above, for UCS-4 I thought the wchar_t is the right
representaion.

>   2) A library API for elementary string manipulations on such
>      strings (both for UTF-16 and UCS-4) gets standardized. ISO C 99
>      wchar_t APIs are not well usable in practice because you don't
>      know what wchar_t is.

Yes, that's our point: We need to now which type we use. Actually we
need a UTF-16 type because of the resource usage. And our I18N group
did propose the library API also. But to use this library efficiently
you need support in the compiler (first).

Greetings
                Christoph


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to