I am now working on improvement of luit to support GBK and TCVN.
During the work, I found a problem on GBK mapping table in
/xc/fonts/encodings/large/gbk-0.enc.gz file.
For example, the gbk-0.enc.gz file has the following line:
0x82FE 0x8351 0x50BC
This line seems to intend the following (correct) mapping:
GBK ISO10646
0x82FE 0x50BC
0x8340 0x50BD
0x8341 0x50BE
...
0x8350 0x50CD
0x8351 0x50CE
However, since Fontenc does not understand that the next code
to 0x82FE in GBK is 0x8340, mapping behavior in this region is
wrongly shifted:
GBK ISO10646
0x82FE 0x50BC
0x8340 0x50FE (note that 0x8340 - 0x82FF = 0x50FE - 0x50BC = 0x42)
0x8341 0x50FF
...
0x8351 0x510F
Another type of discontinuity in GBK encoding exist in 0x**7e and
0x**80. Thus, the following line in gbk-0.enc.gz
0x8273 0x8280 0x4FFF
is also a bug.
Such lines are easily detected using the following script:
#!/usr/bin/perl
while(<>){
next unless (/0x(..)(.).*0x(..)(.).*0x.*/);
$h1 = $1; $m1 = $2; $h2 = $3; $m2 = $4;
if ($h1 ne $h2 ||
($m1 =~ /0|1|2|3|4|5|6|7/ && $m2 =~ /8|9|A|B|C|D|E|F/)) {
print $_;
}
}
The script reports that gbk-0.enc.gz has 119 such bugs.
Is this already-known bug or is anyone working on this bug?
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/
_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n