I've been looking some recently at using UTF-8 locales with Xlib's
i18n code, and the conclusion I seem to be coming to is that the
contents of nls/XLC_LOCALE/en_US.utf8 are entirely and completely bogus.

What is in there is something like

XCOMM   fs0 class
fs0     {
        charset {
                name    ISO10646-1
        }
        font    {
                primary ISO10646-1
        }
}
XCOMM   We leave the legacy encodings in for the moment, because we don't
XCOMM   have that many ISO10646 fonts yet.
XCOMM   fs1 class (7 bit ASCII)
fs1     {
        charset {
                name    ISO8859-1:GL
        }
        font    {
                primary         ISO8859-1:GL
                vertical_rotate all
        }
}
[...]
XCOMM   fs6 class (Half Kana)
fs6     {
        charset {
                name    JISX0201.1976-0:GR
        }
        font    {
                primary         JISX0201.1976-0:GR
                vertical_rotate all
        }
}
END XLC_FONTSET

So, we first list iso10646-1, followed by various legacy encoding, to
act as fallbacks.

I've long known that having iso10646-1 first in this list means that
the fallbacks won't be used if there are any iso10646-1 matching the
fontsets, since Xlib i18n considers all fonts to be encoded "solid".
So, the iso10646-1 font is used even if there are no 

But I realized today that in fact, with this XLC_LOCALE file, the
fallback fonts do no good in any case whatsoever.

The reason why is that in the actual rendering process, the set of
fonts present does not affect what character sets are chosen for
conversion. 

So, even if there is no iso10646-1 font present, the utf-8 string will
still be converted into Unicode-based glyphs, and then these glyphs
will be rendering with whatever font is present, producing junk on
the screen.

There are various bugs in the 'omGeneric' code that make this junk
worse than it needs to be, but that doesn't particularly matter ...
if we only have iso10646-1 fontset in the locale, then if it
fails to load, the fontset will fail to load, which provides maximal
information to the application.

My recommendation, then, for the UTF-8 locale files, is that for locales
where iso10646-1 is a reasonable font encoding, we should point to
a en_US.UTF-8 locale that has only iso10646-1 and nothing else.

And for other locales (CJK languages), we should have separate UTF-8
XLC_LOCALE files that list the language's encoding first, followed
by 10646-1 afterwards.

Aside from some major reworking of the Xlc code, this seems to be
the shortest approach to getting reasonable results.

Regards,
                                        Owen
_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Reply via email to