>> So gftCode*  routines should be added in order to separate the two
>> types of values.
>
>Please don't. _Only_ handle the Unicode and add an API to convert from unicode
>to ascii and from ascii to unicode. This simplifies the API and also allows to
>add more charsets.


But there isn't any need at all to convert from Unicode to ASCII or the other way 
around: Unicode and ASCII use the same
char codes for ASCII characters, ranging from 0 to 127.

Actually, this applies to all chars in the Latin-1 charmap: between 0 and 255, each 
value maps to the same character in both
Unicode and in Latin-1 encodings. 

The problem we were discussing dealt with passing libgft negative char codes. In my 
last email to Lee, I proposed something
which might be considered a temporary solution until support for different character 
encodings is built into libgft:

Since Unicode only uses 16-bit char codes, and the functions in libgft handle char 
codes as uint32, I believe we have a way
to deal with negative char codes without disturbing Unicode support:

if (char_code > 2^16)
  char_code = (unsigned char) char_code;

This way, if we are passed a valid Unicode value (<=2^16), it won't suffer any change. 
But if the calling function passes
a negative char code, the conversion to uint32 - which will always happen, since the 
libgft functions are prototyped as "functioname(/*...*/,
uint32 char_code);" - will result in a value larger than 2^16 (assuming that the user 
passed us a value > -2^16, which is
the case if s/he is using signed char, which starts way up at -127).

In these cases, we can cast to (unsigned char) and obtain a valid Unicode code which 
refers to the corresponding Latin-1
char.

Even when libgft supports different character encodings, it will be necessary to 
support signed chars being passed as an
argument (i.e., it will have to handle negative charcodes and map them to Unicode), 
and charmaps other than Latin-1 will
envolve a more complicated approach, other than just casting those values to (unsigned 
char).

But until then, I see no harm in leaving Latin-1 support already built into libgft:

- If the calling function is using unsigned chars, libgft can use the char codes 
without any transformation since they have
the same values in Unicode;

- If the calling function is using signed chars (e.g. type "char" if you use gcc under 
Linux with the default options is
signed), non-ASCII characters (<0) will be converted to their correct Unicode value 
(127< code <= 255) by the test I described
above.

I think that this way we aren't hindering future support of other character encodings, 
and solve the only problem that people
using Latin-1 faced when using libgft; what do you think?

Cheers,

Manuel

Reply via email to