On Sun, 14 Apr 2002 [EMAIL PROTECTED] wrote:
> On Fri, 12 Apr 2002, Hideki Hiura wrote:
>
> > > From: Owen Taylor <[EMAIL PROTECTED]>
> > For example, here is the one used in Solaris for en_US.UTF-8 locale,
> > which I think is virtually identical with the one in X.Org's X11R6.6.x.
>
> en_US.UTF-8 in Solaris below includes ksc5601.1992-3 (JOHAB) and
> you wrote that it's virtually identical to the one in X.Org's
> X11R6.6.x. Does it mean that JOHAB (ksc5601.1992-3) support has been added
> to X11R6.6.x ?
I pulled out a part of the source of X11R6.6 reference implementation
at X.org and it seems like support for JOHAB is there.
> > fs10 {
> > charset KSC5601.1992-3:GLGR
> > font {
> > primary KSC5601.1992-3:GLGR
> When I submitted the font encoding file for ksc5601.1992-3 to include
> in XF86, Juliusz and I talked briefly about including ksc5601.1992-3
> support (beyond just being able to present truetype fonts as in
> ksc5601.1992-3 font encoding with freetype moudle), but we concluded (or
> rather, he suggested) that we don't have to because iso10646-1 will do
> the job, instead. However, if we follow Owen's suggestion quoted below,
I tried several variants of XLC_LOCALE definitions for ko_KR.UTF-8
and what I learned for certain is that XF86 4.2 doesn't support
ksc5601.1992-3 aside from being able to package TTF's as in that font
encoding.
> I think we'd better have ksc5601.1992-3 support in XF86 as well.
Now I'm less sure if it means a whole lot of code.
> Owen> And for other locales (CJK languages), we should have separate UTF-8
> Owen> XLC_LOCALE files that list the language's encoding first, followed
> Owen> by 10646-1 afterwards.
My test results showed me that this isn't going to work for
Korean unless ksc5601.1992-3 support is in place because ksc5601.1987-0
has only 2350 Hangul syllables. There's very little point in using
ko_KR.UTF-8 if the character repertoire (as far as Hangul syllables
are concerned) would be the same as ko_KR.EUC-KR. Alternative to
listing legacy nat'l character sets before iso10646-1
is to make use of 'add-style' field of XLFD to label CJK fonts as such
and to explicitly specify 'lang' (ja, zh_TW, zh_CN, ko) in fontsets
for various applications, desktops, etc.
A couple of changes to be made in XLC_LOCALE for en_US.UTF-8
before being 'recycled' for XLC_LOCALE in CJK UTF-8 locales
are:
1. iso10646-1 and iso8859-1 should be followed by cs and
fs of the national character set of a target country/region.
That is, in XLC_LOCALE for zh_CN.UTF-8, gb2312.1980-0
should come BEFORE
jisx0208.1983-0, ksc5601.1987-0 and big5. In ko_KR.UTF-8,
ksc5601.1987-0 should come before jisx0208.1983-0,
gb2312.1980-0 and big5. Otherwise, characters common
in these national character sets would be 'labeled'
as in the *first* character set listed in XLC_LOCALE
in CompoundText. This leads applications running in
locales with legacy encodings (GB2312/EUC-CN, EUC-KR, etc)
to silently rejcet those characters when they're
handed over in CompoundText. For instance, U+4E00 ('one')
is in all CJK character sets. If jisx0208.1983-0 is listed
*before* ksc5601.1987-0 in XLC_LOCALE for ko_KR.UTF-8,
an application running under ko_KR.UTF-8 cannot send
the character to an application running under ko_KR.eucKR
locale because U+4E00 would be encoded as
ESC $ B 30 6C ( 0x30 0x6C : U+4E00 in JIS X 0208 GL)
instead of
ESC $ ( C 6C 69 ( 0x6C 0x69 : U+4E00 in KS C 5601 GL)
Of course, this would not
be an issue if UTF8_STRING is used. However, I don't
know how to get XIM servers to try UTF8_STRING (I've been
modifying Ami, Korean XIM to work in ko_KR.UTF-8)
before falling back to COMPOUND_TEXT.
BTW, I think this would be also an issue for locales like
hu_HU. ISO-8859-2 should be listed before ISO-8859-1:GR in
hu_HU.UTF-8 to avoid losing characters in ISO-8859-1:GR as well as in
ISO-8859-2 when cut'n'pasting from an application
running under hu_HU.UTF-8 to an app. under hu_HU.ISO8859-2.
I don't know if there's any standard that UTF-8 should be
considered as the last resort in making up CompoundText.
I found (while testing my patch to make Korean input
method server Ami work in ko_KR.UTF-8 locale. It's more
or less complete and now I can use it to enter the full
set of Korean syllables in Unicode. The patch is at
http://jshin.net/faq/ami-1.0.11.utf8.patch.gz) that UTF-8 is
not used in CompoundText encoding unless it's absoultly
necessary even if iso10646-1 is the first entry in XLC_LOCALE.
This is rather nice!! For example, U+AC02 (Hangul Syllable GGAGG)
is not in KS C 5601 while U+AC00 is in. When I type 'U+AC00 and
U+AC02' in succession, Ami (modified to work under ko_KR.UTF-8
locale) sends the following compound text string to a client.
ESC $ ( C 30 21 ESC % @ ESC % G EA B0 82 ESC % @
( where '30 21' is for U+AC00 in KS C 5601 GL and 'EA B0 82'
is for U+AC02 in UTF-8)
2. ISO-8859-x's other than ISO-8859-1 and other single byte character
sets should be placed before (or depending on user preference) any
multi byte character sets to work around a width problem.
Now back to specifying 'lang' code in add-style field
of XLFD, I have the following lines in fonts.dir where
Korean baekmuk truetype fonts are installed.
---------
gulim.ttf -baekmuk-gulim-medium-r-normal-ko-0-0-0-0-c-0-iso10646-1
gulim.ttf -baekmuk-gulim-medium-r-normal-ko-0-0-0-0-p-0-iso10646-1
batang.ttf -baekmuk-batang-medium-r-normal-ko-0-0-0-0-c-0-iso10646-1
batang.ttf -baekmuk-batang-medium-r-normal-ko-0-0-0-0-p-0-iso10646-1
gulim.ttf -baekmuk-gulim-medium-r-normal-ko-0-0-0-0-c-0-ksc5601.1992-3
gulim.ttf -baekmuk-gulim-medium-r-normal-ko-0-0-0-0-p-0-ksc5601.1992-3
batang.ttf -baekmuk-batang-medium-r-normal-ko-0-0-0-0-c-0-ksc5601.1992-3
batang.ttf -baekmuk-batang-medium-r-normal-ko-0-0-0-0-p-0-ksc5601.1992-3
.......similar lines for ksc5601.1987-0
----------
With gtkrc.ko_KR.utf8 shown below, gnome applications work pretty
well under ko_KR.UTF-8 locale.
---------- /etc/gtk/gtkrc.ko_KR.utf8
style "gtk-default-ko-kr-utf8" {
fontset =
"-*-*-medium-r-normal-ko-14-*-*-*-p-*-iso10646-1,\
-*-*-medium-r-normal-*-14-*-*-*-*-*-*-*"
}
class "GtkWidget" style "gtk-default-ko-kr-utf8"
---------
-------- ~/ko_KR.UTF-8/Xedit
*fontSet: \
-*-*-medium-r-normal-ko-14-*-*-*-p-*-iso10646-1,\
-*-*-medium-r-normal--14-*-*-*-*-*-iso10646-1,\
-*-*-medium-r-normal--14-*-*-*-*-*-*-*
*international: True
*inputMethod: Ami
---------
If I use '*' in place of 'ko', I just have to pray that Korean
iso10646-1 font is picked up.
Summing up, I have two suggestions:
1. In XFree86, XLC_LOCALE files for ll_CC.UTF-8 have to
be *taylored* for each ll_CC so that CompoundText
works between applications running under ll_CC.Legacy
and ll_CC.UTF-8.
2. To work around a nasty width problem, we have to take
advantage of 'add-style' field for iso10646-1 fields
to specify 'lang' of fonts. This should be done
on a few fronts in cooperation: font developers/package
builders and application/desktop developers.
Jungshik Shin
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/