I am now working on improvement of luit to support GBK and TCVN.
During the work, I found a problem on GBK mapping table in 
/xc/fonts/encodings/large/gbk-0.enc.gz file.

For example, the gbk-0.enc.gz file has the following line:

    0x82FE  0x8351  0x50BC

This line seems to intend the following (correct) mapping:

    GBK     ISO10646
    0x82FE  0x50BC
    0x8340  0x50BD
    0x8341  0x50BE
    ...
    0x8350  0x50CD
    0x8351  0x50CE

However, since Fontenc does not understand that the next code
to 0x82FE in GBK is 0x8340, mapping behavior in this region is
wrongly shifted:

    GBK     ISO10646
    0x82FE  0x50BC
    0x8340  0x50FE   (note that 0x8340 - 0x82FF = 0x50FE - 0x50BC = 0x42)
    0x8341  0x50FF
    ...
    0x8351  0x510F

Another type of discontinuity in GBK encoding exist in 0x**7e and
0x**80.  Thus, the following line in gbk-0.enc.gz

    0x8273  0x8280  0x4FFF

is also a bug.

Such lines are easily detected using the following script:

#!/usr/bin/perl
while(<>){
        next unless (/0x(..)(.).*0x(..)(.).*0x.*/);
        $h1 = $1; $m1 = $2; $h2 = $3; $m2 = $4;
        if ($h1 ne $h2 ||
            ($m1 =~ /0|1|2|3|4|5|6|7/ && $m2 =~ /8|9|A|B|C|D|E|F/)) {
                print $_;
        }
}

The script reports that gbk-0.enc.gz has 119 such bugs.

Is this already-known bug or is anyone working on this bug?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Reply via email to