Re: [Fonts]Automatic 'lang' determination

Yu Shao Sun, 30 Jun 2002 18:04:55 -0700

Pablo Saratxaga wrote:

>Kaixo!
>
>On Sat, Jun 29, 2002 at 05:17:04PM -0700, Keith Packard wrote:
> 
>
>>>What are those glyphs? (I'm quite surprised, I would have expected the
>>>opposite: fonts generally have more glyphs than the standard encodings of
>>>the sio-8859 family for example)
>>>
>>My definition of language tag is coloured by the OS/2 table codePageRange 
>>bits from which is was originally defined in fontconfig.  Those bits are 
>>defined to map to specific Windows code pages; the Latin-1 case doesn't 
>>map to ISO 8859-1, but rather to code page 1252 for which many fonts are 
>>missing a few random entries.
>>
>
>But what characters are those?
>It is possible that they are the onesthat have been added to cp1252
>and that didn't existed some years ago?
>I think the matching should be done against the lowest denominator
>and be strict; or to give different weights to the miss of *letters*
>or other symbols (it may be more or less acceptable to get quotation
>marks from another font; bUt lEttErs frOm A dIffErEnt fOnts Is vErY UglY).
>
>>>No, the tolerance for missing glyphs in CJK tests should be the same or
>>>even smaller. The difference is that it isn't needed to test all the glyphs
>>>for CJK coverages; testing only a set of 256 choose glyphs would be enough
>>>(if they are correctly choosen, testing that 256 glyphs are present in a
>>>font is enough to assure, with 99.99% of confidence, that it covers a given
>>>CJK language).
>>>
>>I'm not confident enough of this approach; I fear that any set of 256 
>>glyphs that must appear in a simplified Chinese font may well appear in 
>>many traditional Chinese (or even Japanese) fonts.
>>
>
>Most do, of course, but there are a lot that don't.
>I only dealt with a ~10-15 ttf CJK fonts, but never had false positives
>using that method.
>
>>>out there that doesn't encode all the characters of gb2312?
>>>
>>It seems that this must be the case -- I set the '500' number so high 
>>because all of the fonts which I have that advertise support for 
>>simplified Chinese are missing over 200 glyphs from GB2312.  I got
>>similar results for Japanese fonts, Korean Wansung fonts and traditional 
>>Chinese fonts.
>>
>
>But what characters are those missing?
>Could it be that those are semi-graphic ones, or scripts used by other
>languages (eg: cyrillic, greek, japanese kana in chinese font, etc).
>Here too, different weights should be used, it is not a big problem if
>a CJK font is missing cyrillic, a font designed for russian will be a much
>better choice to render cyrillic anyway; but it may be a big problem if
>some needed characters are missing.
>
>And I'm really surprised by such a high number as 200.
>Are you sure you tested against gb2312 and not agains the Microsoft
>codepage based on it (that surely adds several extra characters) ?
>
Hi Keith,


Checking against fontenc,  Both "AR PL SungtiL GB" and"AR PL KaitiM GB" 
provide all GB2312's 7445 characters which include 6763 Hanzis and 682 
symbols. fc-cache reports 204 missing seems not correct?

Regards,

>
>
>>>But to handle such case, I think it would be better to choose a given
>>>definition of "big5" (or several of them) and stick to it, rather than
>>>allowing a so tremendously big hole as 500 possible missing chars.
>>>
>>Missing 500 from a repertoire of nearly 20000 doesn't seem to render most 
>>of these fonts unusable.
>>
>
>It could, it depends on what glyphs are missing.
>
>


-- 
Yu Shao
Red Hat Asia-Pacific
+61 7 3872 4835
Legal:   http://apac.redhat.com/disclaimer



_______________________________________________
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts

Re: [Fonts]Automatic 'lang' determination

Reply via email to