Kaixo! On Sat, Jun 29, 2002 at 01:20:34PM -0700, Keith Packard wrote: > > A font is suited for a given language when it covers *ALL* of the codepoints > > needed for that language. > > Yes, that's obviously true, but the problem is that I don't have tables for > each language indicating the required codepoints, all I have are tables > listing Unicode values in encodings traditionally used for each language. > These tables almost always include a few (1-5) glyphs which many fonts are > missing.
What are those glyphs? (I'm quite surprised, I would have expected the opposite: fonts generally have more glyphs than the standard encodings of the sio-8859 family for example) >> So, the tests for CJK languages and for other languages are clearly different, >> only CJK languages can go with testing only a "signifiant fraction", >> for all other languages all chars must be tested. > > Yes, the tolerance value given for the Han languages is 500 codepoints > while the value for non-Han languages is two orders of magnitude smaller. No, the tolerance for missing glyphs in CJK tests should be the same or even smaller. The difference is that it isn't needed to test all the glyphs for CJK coverages; testing only a set of 256 choose glyphs would be enough (if they are correctly choosen, testing that 256 glyphs are present in a font is enough to assure, with 99.99% of confidence, that it covers a given CJK language). That cannot be done for the 8bit latin/cyrillic encodings because there is too much overlapping between them (in the case of iso-8859-1/iso-8859-15 the overlapping is of 97% for example). While there is also a lot of overlapping between CJK encodings, there are large plages of non overlaping chars, chars that appear only in the japanese encoding, or only in gb2312, or only in big5 etc. (I mean by "only": "not in any other widely used legacy encoding", so explicitely excluding unicode that of course includes them all). As those "exclusive" chars are numerous enough it is possbile to test for the presence of some of them in a font and determine a language coverage from there. Of course, complete checking can also be done, but I wonder if it is actually useful (I mean, is there a font suitable for simplified chinese out there that doesn't encode all the characters of gb2312? It would be like a font for English that is missing the "r" letter). "Big5" is a bit more problematic, as there is no such a thing as a well defined "Big5" encoding, but rather, in the pure Microsoftian tradition (big5 comes after all from that side) a number of revisions all named the same, that adds some characters, and an older font can miss some chars that a newer one has (according to a newer definition of "big5"). But to handle such case, I think it would be better to choose a given definition of "big5" (or several of them) and stick to it, rather than allowing a so tremendously big hole as 500 possible missing chars. -- Ki ça vos våye bén, Pablo Saratxaga http://chanae.stben.be/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Italian or Portuguese] _______________________________________________ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts