Pablo Saratxaga wrote: >Kaixo! > >On Sat, Jun 29, 2002 at 05:17:04PM -0700, Keith Packard wrote: > > >>>What are those glyphs? (I'm quite surprised, I would have expected the >>>opposite: fonts generally have more glyphs than the standard encodings of >>>the sio-8859 family for example) >>> >>My definition of language tag is coloured by the OS/2 table codePageRange >>bits from which is was originally defined in fontconfig. Those bits are >>defined to map to specific Windows code pages; the Latin-1 case doesn't >>map to ISO 8859-1, but rather to code page 1252 for which many fonts are >>missing a few random entries. >> > >But what characters are those? >It is possible that they are the onesthat have been added to cp1252 >and that didn't existed some years ago? >I think the matching should be done against the lowest denominator >and be strict; or to give different weights to the miss of *letters* >or other symbols (it may be more or less acceptable to get quotation >marks from another font; bUt lEttErs frOm A dIffErEnt fOnts Is vErY UglY). > >>>No, the tolerance for missing glyphs in CJK tests should be the same or >>>even smaller. The difference is that it isn't needed to test all the glyphs >>>for CJK coverages; testing only a set of 256 choose glyphs would be enough >>>(if they are correctly choosen, testing that 256 glyphs are present in a >>>font is enough to assure, with 99.99% of confidence, that it covers a given >>>CJK language). >>> >>I'm not confident enough of this approach; I fear that any set of 256 >>glyphs that must appear in a simplified Chinese font may well appear in >>many traditional Chinese (or even Japanese) fonts. >> > >Most do, of course, but there are a lot that don't. >I only dealt with a ~10-15 ttf CJK fonts, but never had false positives >using that method. > >>>out there that doesn't encode all the characters of gb2312? >>> >>It seems that this must be the case -- I set the '500' number so high >>because all of the fonts which I have that advertise support for >>simplified Chinese are missing over 200 glyphs from GB2312. I got >>similar results for Japanese fonts, Korean Wansung fonts and traditional >>Chinese fonts. >> > >But what characters are those missing? >Could it be that those are semi-graphic ones, or scripts used by other >languages (eg: cyrillic, greek, japanese kana in chinese font, etc). >Here too, different weights should be used, it is not a big problem if >a CJK font is missing cyrillic, a font designed for russian will be a much >better choice to render cyrillic anyway; but it may be a big problem if >some needed characters are missing. > >And I'm really surprised by such a high number as 200. >Are you sure you tested against gb2312 and not agains the Microsoft >codepage based on it (that surely adds several extra characters) ? > Hi Keith,
Checking against fontenc, Both "AR PL SungtiL GB" and"AR PL KaitiM GB" provide all GB2312's 7445 characters which include 6763 Hanzis and 682 symbols. fc-cache reports 204 missing seems not correct? Regards, > > >>>But to handle such case, I think it would be better to choose a given >>>definition of "big5" (or several of them) and stick to it, rather than >>>allowing a so tremendously big hole as 500 possible missing chars. >>> >>Missing 500 from a repertoire of nearly 20000 doesn't seem to render most >>of these fonts unusable. >> > >It could, it depends on what glyphs are missing. > > -- Yu Shao Red Hat Asia-Pacific +61 7 3872 4835 Legal: http://apac.redhat.com/disclaimer _______________________________________________ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts