Markus Kuhn writes:

>   - You might have thought that supporting ISO 2022 locales is
>     incompatible with __STDC_ISO_10646__. First of all, ISO 2022
>     is *not* at all suitable for use in Unix locales anyway, so the
>     issue should really be irrelevant. But just for the sake of argument,
>     assuming you still really have to use it for whatever reasons,
>     you will find that the UCS-4 private use areas are more than big
>     enough to map all registered ISO 2022/ISO 2375 encodings into them.
> 
>     Essentially, you store character X from ISO-IR encoding number Y
>     as
> 
>        (wchar_t) (0x60000000 + y * 0x200000 + Conv_ISO_IR_Y_to_UCS(X))

But this creates wchar_t values in a range where applications will not
expect them, if __STDC_ISO_10646__ is defined.

A different (and more standard conformant) way to define wchar_t in
an ISO-2022 locale is to use the Unicode 3.1 language tags.

An ISO-2022 escape sequence that switches to a Japanese charset is
thus encoded as
            0xE0001  LANGUAGE TAG
            0xE006A  'j'
            0xE0061  'a'

Similarly for Chinese and Korean charsets.

An ISO-2022 escape sequence that switches to a European cyrillic
encoding (i.e. removes the CJK bias) is encoded as

            0xE007F  TAG CANCEL

This way, you turn the multibyte sequence into a widechar sequence
that can be converted back to multibyte without loss of information.

Of course the resulting wide character sequence is stateful. But the
multibyte sequence is stateful as well. That's what you get for
wanting ISO 2022...

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to