Re: [Fonts]Automatic 'lang' determination

Keith Packard Sat, 29 Jun 2002 17:05:09 -0700


Around 1 o'clock on Jun 30, Pablo Saratxaga wrote:


> What are those glyphs? (I'm quite surprised, I would have expected the
> opposite: fonts generally have more glyphs than the standard encodings of
> the sio-8859 family for example)

My definition of language tag is coloured by the OS/2 table codePageRange 
bits from which is was originally defined in fontconfig.  Those bits are 
defined to map to specific Windows code pages; the Latin-1 case doesn't 
map to ISO 8859-1, but rather to code page 1252 for which many fonts are 
missing a few random entries.

Similarly for the other tags, the existing fonts that I have don't 
generally seem to cover the complete windows code page from which the 
codePageRange bit was derived.

> No, the tolerance for missing glyphs in CJK tests should be the same or
> even smaller. The difference is that it isn't needed to test all the glyphs
> for CJK coverages; testing only a set of 256 choose glyphs would be enough
> (if they are correctly choosen, testing that 256 glyphs are present in a
> font is enough to assure, with 99.99% of confidence, that it covers a given
> CJK language).

I'm not confident enough of this approach; I fear that any set of 256 
glyphs that must appear in a simplified Chinese font may well appear in 
many traditional Chinese (or even Japanese) fonts.  

Certainly we could experimentally determine a reasonable subset, and it's 
completely trivial to change the matching table used in the code.

> Of course, complete checking can also be done, but I wonder if it is
> actually useful (I mean, is there a font suitable for simplified chinese
> out there that doesn't encode all the characters of gb2312?

It seems that this must be the case -- I set the '500' number so high 
because all of the fonts which I have that advertise support for 
simplified Chinese are missing over 200 glyphs from GB2312.  I got
similar results for Japanese fonts, Korean Wansung fonts and traditional 
Chinese fonts.

I would need a significantly larger set of fonts than I currently have 
access to if I wanted to generate smaller test char sets.  Now that the 
tests stand in isolation, perhaps those skilled with particular languages 
can develop more specific tests.

> But to handle such case, I think it would be better to choose a given
> definition of "big5" (or several of them) and stick to it, rather than
> allowing a so tremendously big hole as 500 possible missing chars.

Missing 500 from a repertoire of nearly 20000 doesn't seem to render most 
of these fonts unusable.

Keith Packard        XFree86 Core Team        HP Cambridge Research Lab



_______________________________________________
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts

Re: [Fonts]Automatic 'lang' determination

Reply via email to