RE: Proposal for 2 Byte Unicode implementation in gcc and glibc

Bruno Haible Tue, 15 Aug 2000 06:25:46 -0700

Wilhelm N��er writes:

>     We�d like to point out that the literals are the most interesting point.
> 
>     Reason:
>       missing functions can be implemented by everyone who thinks he/she
>       needs them, but the literals and their structure must be defined
>       by the compiler.
>       
>       Especially when you want to port existing code to Unicode you need
>       a way to represent your usual ( english, 7 bit, ..) string literals in
>       Unicode format.
>
>             The format of the string literals determine the way other
>       strings are handled.
> 
>       We DO NOT want to write hebraic, arabic ... glyphs in our sources!

The C/C++ compiler does what is specified in the language spec. The
language spec does not foresee a means to convert an ASCII string to
an uint16_t array at compile time. To work around this, you have two
options:

  - Do the conversion at run time (in C++ possibly at static
    initialization time).

  - Do the conversion in a preprocessing stage before you call the
    C/C++ compiler.

>     There are good reasons why Java and IBM�s  ICU have choosen UTF16 over
>     any other implementatoin.

Java chose UCS-2, not UTF-16, the major reason being to be able to
address the n-th character of a string in constant time. Now they are
starting to add UTF-16 support...

        Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

RE: Proposal for 2 Byte Unicode implementation in gcc and glibc

Reply via email to