Re: [I18n] re: a new font encoding file for XF86 : gb18030.2000-2? (fwd)

2003-07-09 Thread Juliusz Chroboczek
YL 1. To my knowledge the gb18030.2000-0 and gb18030.2000-1 encodings are
YL invented by Sun and used in their Solaris 9. The only application on Linux
YL that supports them is Mozilla (maybe Java1.4 as well?) at the request of
YL Sun (see mozilla bug 72525). IMHO, if you want to extend the system to add
YL such as gb18030.2000-2, it's probably a good idea to consult with Sun just
YL so that it will be compatible with any potential Sun's own extension.

1. Sun implements GB18030.2000.
2. Mozilla implements GB18030.2000 for compatibility with Sun.
   Because Mozilla is cross-platform, the support finds itself on Linux.
3. XFree86 should implement GB18030.2000 for compatibility with
   Mozilla.

Interesting process.

Juliusz

P.S. Not that I care either way.
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n


Re: [I18n] re: a new font encoding file for XF86 : gb18030.2000-2?(fwd)

2003-07-08 Thread Jungshik Shin
On Sun, 6 Jul 2003, Yong Li wrote:

Hi Rigel,

Thanks for your kind comments. I fully agree with you on most, if not all,
points.


 1. To my knowledge the gb18030.2000-0 and gb18030.2000-1 encodings are
 invented by Sun and used in their Solaris 9. The only application on Linux

As I wrote on bugzilla (I guess you wrote your reply before I added
my latest comment to XF86 bugzilla), I got astray by the presence
of gb18030.2000-1.enc file on my RH 8.0. I couldn't connect to the XF86 CVS
and assumed that it's what XF86 has. It turned out that the file was
RH-specific and had not been committed to XF86.

 that supports them is Mozilla (maybe Java1.4 as well?) at the request of
 Sun (see mozilla bug 72525).

Mozilla's GB18030Font1 encoder (Unicode - gb18030.2000-1) does not cover
some  'single-width' (usually) characters such as Euro and Latin-1 chars
(see
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/nsUnicodeToGBK.cpp#180
). I believe this exclusion was done on purpose to avoid rendering those characters
with 'double width' Chinese glyphs(there's another protection - built-in - against
this in Mozilla code, though). This is another point where I was misled. I should
have tracked down (a) bug(s) for which this encoder was added (as you have done.)

 4. The gb18030.2000-1.enc.gz file included in RedHat 9 is totally wired.
 I can not figure out what it is.

RH8's gb18030.2000-1.enc file (I don't know whehter it's the same
as RH9's) appears to represent a straight identity mapping for  a subset
of BMP characters. Exactly what subset is covered I didn't bother to figure
out.  (All the Chinese characters and characters for Chinese minority scripts
- Yi, Mongolian, etc - are included.)


  IMHO, if you want to extend the system to add
 such as gb18030.2000-2, it's probably a good idea to consult with Sun just
 so that it will be compatible with any potential Sun's own extension.

 With my misunderstanding about what's suggested by Roland in XF86
bugzilla cleared, there's no need for that. I raised up a possibility
of gb18030.2000-2 because I mistakenly thought that attachment 348
represents a new font encoding that is distinct from the existing
gb18030.2000-1 (that I thought had been well-establisehd). If they're
different and covers  disjoin sets of characters, they need to have
different names. As I wrote above, what Roland suggested had been used
by Solaris and Mozilla (and very likely by Turbo Linux) while what I
though was well-established turned out to be RH-specific.


 Personally though I don't think the new font encoding is needed, as we are
 rapidly moving away from the core font technologies (at least in the
 XFree86 world). For any application that does support non-BMP characters,
 most likely it already uses Xft/fontconfig anyway.

  Absolutely. I have no intention of extending the life of 10+ year old
not-so-flexible XLFD-based font selection mechanism. The introduction of
Xft/fontconfig is one of the best things that has happened to X11 (although
fontconfig is not just for X11).


 seems to be the requirement of GB18030 conformance test. The Standard
 however have defined all the mappings between GB18030 and every code point
 in UTF-16 space. It's unclear (to me at least) what exactly consist of
 legal GB18030 codes. The attachment 348 seems included every BMP code
 point that is not in gb18030.2000-0.  I think sometimes it's useful to
 know whether a code is a non-existent character or a legal code but not
 exist in a certain font. So I suggest to remove the unassigned BMP code
 points from that file.

 Hmm. that's an interesting point. I guess GB18030 is supposed to
have exactly the same repertoire as ISO 10646. To keep it in sync with
Unicode/10646 without playing a catch-up game with 10646/Unicode,
it's better to cover all legal - assigned or not - code points also
valid in GB18030. As you wrote, fewer and fewer people would bother
themselves with X11 core fonts as time goes by

 Also the STARTMAPPING cmap 3 4 entry at the end
 should be removed because it's obviously not an identical mapping.

  Yup. Perhaps, Roland just copied it from gb18030.2000-0 or gbk-0.enc.


 3. The gb18030.2000-0 file is probably not needed. Yes, it's true that the
 two-byte codes in GB18030 are slightly different than GBK. There are 80 also
 code points, that are mapped to PUA in GBK, got official assignments in later
 Unicode standards and GB18030 adopted the new mappings. However that doesn't
 mean gb18030.2000-0 uses the new mappings because Sun could opt to keep backward
 compatibility with GBK fonts by making gb18030.2000-0 and gbk same. Judging by
 the comments posted on Mozilla bugzilla by engineers from Sun it is probably
 indeed the case (see e.g. bug 72525 and 81200). It would be nice if someone
 from Sun could confirm this.

   Yes, that would be nice. Actually, that is what's done in RH 8.0
(gbk-0.enc file  has a line making it an alias to gb18030.2000-0).
Even if they're not made 

Re: [I18n] re: a new font encoding file for XF86 : gb18030.2000-2?(fwd)

2003-07-08 Thread Yu Shao
Some explanations of RedHat's GB18030.2000*.enc:

Because RedHat XFree86 18030 patch's compound text encoding part was 
based on James Su's patch which was derived from UTF-8' code, it doesn't 
really need GB18030.2000-0.enc and GB18030.200-1.enc to be functioning. 
GB18030.2000* aliases were added purely because we want Mozilla working 
properly as well.

About the identical mapping in RedHat's GB18030.2000-1, it is because 
the inside compound encoding part is treating them as ISO10646 codes.

Regards,
Yu Shao
Jungshik Shin ??:

On Sun, 6 Jul 2003, Yong Li wrote:

Hi Rigel,

Thanks for your kind comments. I fully agree with you on most, if not all,
points.
 

1. To my knowledge the gb18030.2000-0 and gb18030.2000-1 encodings are
invented by Sun and used in their Solaris 9. The only application on Linux
   

As I wrote on bugzilla (I guess you wrote your reply before I added
my latest comment to XF86 bugzilla), I got astray by the presence
of gb18030.2000-1.enc file on my RH 8.0. I couldn't connect to the XF86 CVS
and assumed that it's what XF86 has. It turned out that the file was
RH-specific and had not been committed to XF86.
 

that supports them is Mozilla (maybe Java1.4 as well?) at the request of
Sun (see mozilla bug 72525).
   

Mozilla's GB18030Font1 encoder (Unicode - gb18030.2000-1) does not cover
some  'single-width' (usually) characters such as Euro and Latin-1 chars
(see
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/nsUnicodeToGBK.cpp#180
). I believe this exclusion was done on purpose to avoid rendering those characters
with 'double width' Chinese glyphs(there's another protection - built-in - against
this in Mozilla code, though). This is another point where I was misled. I should
have tracked down (a) bug(s) for which this encoder was added (as you have done.)
 

4. The gb18030.2000-1.enc.gz file included in RedHat 9 is totally wired.
I can not figure out what it is.
   

RH8's gb18030.2000-1.enc file (I don't know whehter it's the same
as RH9's) appears to represent a straight identity mapping for  a subset
of BMP characters. Exactly what subset is covered I didn't bother to figure
out.  (All the Chinese characters and characters for Chinese minority scripts
- Yi, Mongolian, etc - are included.)
 

IMHO, if you want to extend the system to add
such as gb18030.2000-2, it's probably a good idea to consult with Sun just
so that it will be compatible with any potential Sun's own extension.
   

With my misunderstanding about what's suggested by Roland in XF86
bugzilla cleared, there's no need for that. I raised up a possibility
of gb18030.2000-2 because I mistakenly thought that attachment 348
represents a new font encoding that is distinct from the existing
gb18030.2000-1 (that I thought had been well-establisehd). If they're
different and covers  disjoin sets of characters, they need to have
different names. As I wrote above, what Roland suggested had been used
by Solaris and Mozilla (and very likely by Turbo Linux) while what I
though was well-established turned out to be RH-specific.
 

Personally though I don't think the new font encoding is needed, as we are
rapidly moving away from the core font technologies (at least in the
XFree86 world). For any application that does support non-BMP characters,
most likely it already uses Xft/fontconfig anyway.
   

 Absolutely. I have no intention of extending the life of 10+ year old
not-so-flexible XLFD-based font selection mechanism. The introduction of
Xft/fontconfig is one of the best things that has happened to X11 (although
fontconfig is not just for X11).
 

seems to be the requirement of GB18030 conformance test. The Standard
however have defined all the mappings between GB18030 and every code point
in UTF-16 space. It's unclear (to me at least) what exactly consist of
legal GB18030 codes. The attachment 348 seems included every BMP code
point that is not in gb18030.2000-0.  I think sometimes it's useful to
know whether a code is a non-existent character or a legal code but not
exist in a certain font. So I suggest to remove the unassigned BMP code
points from that file.
   

Hmm. that's an interesting point. I guess GB18030 is supposed to
have exactly the same repertoire as ISO 10646. To keep it in sync with
Unicode/10646 without playing a catch-up game with 10646/Unicode,
it's better to cover all legal - assigned or not - code points also
valid in GB18030. As you wrote, fewer and fewer people would bother
themselves with X11 core fonts as time goes by
 

Also the STARTMAPPING cmap 3 4 entry at the end
should be removed because it's obviously not an identical mapping.
   

 Yup. Perhaps, Roland just copied it from gb18030.2000-0 or gbk-0.enc.

 

3. The gb18030.2000-0 file is probably not needed. Yes, it's true that the
two-byte codes in GB18030 are slightly different than GBK. There are 80 also
code points, that are mapped to PUA in GBK, got official assignments in later
Unicode standards and GB18030 adopted the new 

Re: [I18n] re: a new font encoding file for XF86 : gb18030.2000-2?(fwd)

2003-07-08 Thread Jungshik Shin
On Tue, 8 Jul 2003, Yu Shao wrote:

Thanks for your answer.

 Because RedHat XFree86 18030 patch's compound text encoding part was
 based on James Su's patch which was derived from UTF-8' code, it doesn't
 really need GB18030.2000-0.enc and GB18030.200-1.enc to be functioning.
 GB18030.2000* aliases were added purely because we want Mozilla working
 properly as well.

  Aliasing gbk-0.enc to gb18030.2000-0.enc is fine except for 80 characters
with  different assignments in two encodings.  However,
gb18030.2000-1.enc in RH8 is different from Mozilla's GB18030Font1.
Mozilla's GB18030Font1 is based on gb18030.2000-1 used in Solaris 9
(which is the same as attachement 348 in XF86 bugzilla and what James
Su proposed adding in December 2002). So, the last
sentence in the above paragraph doesn't seem to make sense. On top of
that, RedHat 8/9 ships Xft-build of Mozilla by default so that
Mozilla's encoders for X11 core fonts shouldn't be your concern,
should they? Of course, when it's run with GDK_USE_XFT=0, it does matter.

 About the identical mapping in RedHat's GB18030.2000-1, it is because
 the inside compound encoding part is treating them as ISO10646 codes.

  This is a bit confusing.  How am I supposed to interpret this together
with  the first sentennce in your reply? Do you need RH8's
version of gb18030.2000-1.enc or not?

  How would you propose the conflict between RH's gb18030.2000-1.enc and
Solaris/Mozilla/Java's gb18030.2000-1 be solved?  Could you add your
comment to http://bugs.xfree86.org//cgi-bin/bugzilla/show_bug.cgi?id=441 ?


  Jungshik

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n


Re: [I18n] re: a new font encoding file for XF86 : gb18030.2000-2?(fwd)

2003-07-07 Thread Yong Li
Hello Jungshik,

I have a few comments.

1. To my knowledge the gb18030.2000-0 and gb18030.2000-1 encodings are
invented by Sun and used in their Solaris 9. The only application on Linux
that supports them is Mozilla (maybe Java1.4 as well?) at the request of
Sun (see mozilla bug 72525). IMHO, if you want to extend the system to add
such as gb18030.2000-2, it's probably a good idea to consult with Sun just
so that it will be compatible with any potential Sun's own extension.
Personally though I don't think the new font encoding is needed, as we are
rapidly moving away from the core font technologies (at least in the
XFree86 world). For any application that does support non-BMP characters,
most likely it already uses Xft/fontconfig anyway.

2. I believe Sun's own gb18030.2000-1 only have some less than 7000 codes
including CJK Ext. A and code points for 4 Chinese minority scripts. That
seems to be the requirement of GB18030 conformance test. The Standard
however have defined all the mappings between GB18030 and every code point
in UTF-16 space. It's unclear (to me at least) what exactly consist of
legal GB18030 codes. The attachment 348 seems included every BMP code
point that is not in gb18030.2000-0.  I think sometimes it's useful to
know whether a code is a non-existent character or a legal code but not
exist in a certain font. So I suggest to remove the unassigned BMP code
points from that file. Also the STARTMAPPING cmap 3 4 entry at the end
should be removed because it's obviously not an identical mapping.

3. The gb18030.2000-0 file is probably not needed. Yes, it's true that the
two-byte codes in GB18030 are slightly different than GBK. There are 80 also
code points, that are mapped to PUA in GBK, got official assignments in later
Unicode standards and GB18030 adopted the new mappings. However that doesn't
mean gb18030.2000-0 uses the new mappings because Sun could opt to keep backward
compatibility with GBK fonts by making gb18030.2000-0 and gbk same. Judging by
the comments posted on Mozilla bugzilla by engineers from Sun it is probably
indeed the case (see e.g. bug 72525 and 81200). It would be nice if someone
from Sun could confirm this.

4. The gb18030.2000-1.enc.gz file included in RedHat 9 is totally wired.
I can not figure out what it is.

Regards,
rigel



On Sun, 6 Jul 2003, Jungshik Shin wrote:

 Hi,
 
 I sent the following to James Su to seek his opinion, but it was bounced. Now
 I'm sending to 1i8n and fonts list expecting him or other Chinese experts to
 pick this up.
 
 
 Jungshik
 
 
 Hi,
 
 Could you make a comment on
 http://bugs.xfree86.org//cgi-bin/bugzilla/show_bug.cgi?id=441?
 
 It's about adding a new font encoding file to XF86 for BMP characters
 NOT covered by gbk-0/gb18030.2000-0.enc and gb18030.2000-1.enc that you
 proposed and was/were accepted. I don't think it's necessary, but your
 expert opinion would be great to have. I tried to add you to CC of bugzilla,
 but you're registered there so that I'm writing this instead.
 
 Thank you,
 
 Jungshik
 
 
 
 ___
 I18n mailing list
 [EMAIL PROTECTED]
 http://XFree86.Org/mailman/listinfo/i18n
 

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n