On Sun, 6 Jul 2003, Yong Li wrote:

Hi Rigel,

Thanks for your kind comments. I fully agree with you on most, if not all,
points.


> 1. To my knowledge the gb18030.2000-0 and gb18030.2000-1 encodings are
> invented by Sun and used in their Solaris 9. The only application on Linux

As I wrote on bugzilla (I guess you wrote your reply before I added
my latest comment to XF86 bugzilla), I got astray by the presence
of gb18030.2000-1.enc file on my RH 8.0. I couldn't connect to the XF86 CVS
and assumed that it's what XF86 has. It turned out that the file was
RH-specific and had not been committed to XF86.

> that supports them is Mozilla (maybe Java1.4 as well?) at the request of
> Sun (see mozilla bug 72525).

Mozilla's GB18030Font1 encoder (Unicode -> gb18030.2000-1) does not cover
some  'single-width' (usually) characters such as Euro and Latin-1 chars
(see
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/nsUnicodeToGBK.cpp#180
). I believe this exclusion was done on purpose to avoid rendering those characters
with 'double width' Chinese glyphs(there's another protection - built-in - against
this in Mozilla code, though). This is another point where I was misled. I should
have tracked down (a) bug(s) for which this encoder was added (as you have done.)

> 4. The gb18030.2000-1.enc.gz file included in RedHat 9 is totally wired.
> I can not figure out what it is.

RH8's gb18030.2000-1.enc file (I don't know whehter it's the same
as RH9's) appears to represent a straight identity mapping for  a subset
of BMP characters. Exactly what subset is covered I didn't bother to figure
out.  (All the Chinese characters and characters for Chinese minority scripts
- Yi, Mongolian, etc - are included.)


>  IMHO, if you want to extend the system to add
> such as gb18030.2000-2, it's probably a good idea to consult with Sun just
> so that it will be compatible with any potential Sun's own extension.

 With my misunderstanding about what's suggested by Roland in XF86
bugzilla cleared, there's no need for that. I raised up a possibility
of gb18030.2000-2 because I mistakenly thought that attachment 348
represents a new font encoding that is distinct from the existing
gb18030.2000-1 (that I thought had been well-establisehd). If they're
different and covers  disjoin sets of characters, they need to have
different names. As I wrote above, what Roland suggested had been used
by Solaris and Mozilla (and very likely by Turbo Linux) while what I
though was well-established turned out to be RH-specific.


> Personally though I don't think the new font encoding is needed, as we are
> rapidly moving away from the core font technologies (at least in the
> XFree86 world). For any application that does support non-BMP characters,
> most likely it already uses Xft/fontconfig anyway.

  Absolutely. I have no intention of extending the life of 10+ year old
not-so-flexible XLFD-based font selection mechanism. The introduction of
Xft/fontconfig is one of the best things that has happened to X11 (although
fontconfig is not just for X11).


> seems to be the requirement of GB18030 conformance test. The Standard
> however have defined all the mappings between GB18030 and every code point
> in UTF-16 space. It's unclear (to me at least) what exactly consist of
> legal GB18030 codes. The attachment 348 seems included every BMP code
> point that is not in gb18030.2000-0.  I think sometimes it's useful to
> know whether a code is a non-existent character or a legal code but not
> exist in a certain font. So I suggest to remove the unassigned BMP code
> points from that file.

 Hmm. that's an interesting point. I guess GB18030 is supposed to
have exactly the same repertoire as ISO 10646. To keep it in sync with
Unicode/10646 without playing a catch-up game with 10646/Unicode,
it's better to cover all legal - assigned or not - code points also
valid in GB18030. As you wrote, fewer and fewer people would bother
themselves with X11 core fonts as time goes by....

> Also the "STARTMAPPING cmap 3 4" entry at the end
> should be removed because it's obviously not an identical mapping.

  Yup. Perhaps, Roland just copied it from gb18030.2000-0 or gbk-0.enc.


> 3. The gb18030.2000-0 file is probably not needed. Yes, it's true that the
> two-byte codes in GB18030 are slightly different than GBK. There are 80 also
> code points, that are mapped to PUA in GBK, got official assignments in later
> Unicode standards and GB18030 adopted the new mappings. However that doesn't
> mean gb18030.2000-0 uses the new mappings because Sun could opt to keep backward
> compatibility with GBK fonts by making gb18030.2000-0 and gbk same. Judging by
> the comments posted on Mozilla bugzilla by engineers from Sun it is probably
> indeed the case (see e.g. bug 72525 and 81200). It would be nice if someone
> from Sun could confirm this.

   Yes, that would be nice. Actually, that is what's done in RH 8.0
(gbk-0.enc file  has a line making it an alias to gb18030.2000-0).
Even if they're not made compatible, 80 characters with different
code point assignment may as well be 'algorithmically' taken care
of instead of adding a new encoding file.


  Jungshik

_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Reply via email to