[Fonts]Adding language information for TrueType fonts
Many TrueType fonts include an OS/2 table which holds codePageRange bits. These bits indicate the old OS/2 code pages supported by the font, and hence indirectly indicate which languages the font is intended to support. These tables, however, are quite primitive, indicating support for only a very few languages as they hold only 64 bits total. My question is whether I should take these TrueType fonts and test them against my new coverage tables, at least for languages which aren't covered by the codePageRange bits. I now have coverage information for 76 of the 139 ISO 639-1 language names; I used the Unicode code charts to mark coverage for the Indic languages and a few other scripts: Bengali (BN) Tibetan (BO) Gujarati (GU) Khmer (KM) Kannada (KN) Lao (LO) Malayalam (ML) Mongolian (MN) Oriya (OR) Sinhala (Sinhalese) (SI) Tamil (TA) Telugu (GE) Tagalog (TL) Given that these languages have unique alphabets, this method seems relatively sound. I'm still missing several Indic languages and all of the non-arabic African languages. I did remove the @ and ` marks from the latin scripts; that should leave all of them including only the alphabet. I've also committed this whole mess to XFree86 CVS; the coverage files can be found in xc/lib/fontconfig/fc-lang/*.orth Keith PackardXFree86 Core TeamHP Cambridge Research Lab ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]Re: [I18n]language tags in fontconfig
Kaixo! On Sat, Jul 06, 2002 at 03:33:40AM -0700, Keith Packard wrote: I don't know why all of the latin languages include and ', it's probably just a mistake; they're easily removed. For the '' I agree; but the apostrophe may be very important for some languages (eg: French, English) The reason I haven't included the Euro is that this would disable the use of any Latin-1 fonts. Also, monetary symbols could be taken from another font without too much problem; and they are also quite irrelevant ot language (You can very well put an amount in euros in a Chinese text, and an ammont in dollars in an italian text...) I'm also uncomfortable about dropping requirements for numerals; they are more like letters than punctuation. The question is whether you'd want to skip a font just because it didn't support the Basic Latin digits. Applications that I'm writing now (Pango, Mozilla and Tcl/Tk) will failover to another font for missing glyphs. I think for latin based languages the numerals should always be there (as well as the basic ascii set). But for non-latin languages, the whoile ascii set (including the numerals) may be missing from the font; so, for those non-latin languages, the presence of the numerals can be skipped. I will note that my current Arabic table is missing the Arabic numerals, that seems wrong to me. In fact the practice to use western-arabic digits, eastern-arabic digits, or ascii-style digits vary from country to country; maybe even depending on the context (eg: inside a text using arabic shapes, but a document mostly numeric, like a spreadsheet using ascii-style ones) -- Ki ça vos våye bén, Pablo Saratxaga http://chanae.stben.be/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Italian or Portuguese] msg00920/pgp0.pgp Description: PGP signature
[Fonts]fcfreetype.c
Hi Keith, It seems a typo and I think using FcCodePageSet is always safer? Shao diff -uNr fcfreetype.c.orig fcfreetype.c --- fcfreetype.c.origSun Jul 7 22:25:37 2002 +++ fcfreetype.cSun Jul 7 22:27:49 2002 @@ -365,7 +365,7 @@ if (matchCodePage[i]) { if (!FcPatternAddString (pat, FC_LANG, - FcCodePageRange[i].name)) + FcCodePageSet[i].name)) goto bail1; hasLang = TRUE; } ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts]fcfreetype.c
Around 22 o'clock on Jul 7, Yu Shao wrote: It seems a typo and I think using FcCodePageSet is always safer? Good catch, there was a typo, but that code has since been deleted in favor of the new RFC 3066-based language detection. Keith PackardXFree86 Core TeamHP Cambridge Research Lab ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Unicode coverage for languages
Around 23 o'clock on Jul 7, Roger So wrote: Certainly; but have you considered the case that zh-HK and zh-MO users prefer zh-TW fonts over zh-CN fonts, and vice versa for zh-SG? (What other Chinese-speaking regions are there... perhaps zh-MY?) Yes, each language-country pair may specify it's own orthography. zh-HK and zh-MO could use the zh-TW set. To complicate matters, zh-HK uses traditional Chinese, but with more characters than usually is with zh-TW. (Big5 vs Big5 HKSCS) That's fine; zh-HK would use a separate orthography that included the additional glyphs. And of course, many fonts from China now cover most characters defined in GB18030, which means if using coverage tables, these fonts will appear to support both zh-CN and zh-TW... Yes, GB18030 makes this harder -- my GB18030 fonts cover all of Big5 making it essentially impossible to distinguish by code coverage. Fortunately, all of the GB18030 fonts that I've seen are in TrueType format and include the appropriate OS/2 codePageRange bits which indicate design intent. Otherwise, I think using RFC-3066 is a good idea. I've only considered Chinese here as I'm a native Chinese speaker; and I don't think these problems crop up in other languages. Han unification produces it's own issues here which can best be resolved by having fonts specify their target languages. I suspect the best plan may well be to use Unicode coverage for language inclusion and then exclude certain Han languages based on the codePageRange bits. Keith PackardXFree86 Core TeamHP Cambridge Research Lab ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts] [I18n] language tags in fontconfig
Keith Packard wrote: I got the European coverage information from http://www.everytype.com/alphabets I can't find www.everytype.com in the DNS, is that a typo ? I'm curious because I can't understand the differences between xc/lib/fontconfig/fc-lang/en.orth and xc/lib/fontconfig/fc-lang/fr.orth In particular I remember 00e1 (a acute/Ã)¡ but not00f1 (n tilda/Ã) from my French lessons. -- Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge [EMAIL PROTECTED] http://www.dpmms.cam.ac.uk/~werdna ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Re: [I18n]Unicode coverage for languages
On Sat, 2002-07-06 at 13:34, Keith Packard wrote: My plan is to have fonts advertise the complete set of languages that they cover, and then to allow them to further distinguish languages with country codes as needed (zh-TW vs zh-CN). Now matching can take place using the language tags; a font supporting the language for a different country will match less strongly than a font matching the language for the correct country. Both of these will match more strongly than a font not supporting the language at all. This has the benefit of making traditional Chinese fonts preferred over Japanese fonts for the display of simplified Chinese documents. I think this will work better than the current hack using OS/2 codePageRange bits. Certainly; but have you considered the case that zh-HK and zh-MO users prefer zh-TW fonts over zh-CN fonts, and vice versa for zh-SG? (What other Chinese-speaking regions are there... perhaps zh-MY?) To complicate matters, zh-HK uses traditional Chinese, but with more characters than usually is with zh-TW. (Big5 vs Big5 HKSCS) And of course, many fonts from China now cover most characters defined in GB18030, which means if using coverage tables, these fonts will appear to support both zh-CN and zh-TW... Otherwise, I think using RFC-3066 is a good idea. I've only considered Chinese here as I'm a native Chinese speaker; and I don't think these problems crop up in other languages. -- Roger So Debian Developer Sun Wah Linux Limitedi18n/L10n Project Leader Tel: +852 2250 0230 [EMAIL PROTECTED] Fax: +852 2259 9112 http://www.sw-linux.com/ ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
Re: [Fonts] [I18n] language tags in fontconfig
Around 10 o'clock on Jul 7, Dr Andrew C Aitchison wrote: I'm curious because I can't understand the differences between xc/lib/fontconfig/fc-lang/en.orth and xc/lib/fontconfig/fc-lang/fr.orth In particular I remember 00e1 (a acute/Ã)¡ but not00f1 (n tilda/Ã) from my French lessons. (Are you sending text in UTF-8?) The orthographies I built were taken from a source which attempted to include every letter needed to write a particular language, even those which might be only infrequently used. For english, we have words like: rôle, à la king, naïve While the ascii-ification of english is pervasive, my Websters New World Dictionary (not known for it's inclusiveness in general) still lists these spellings as native. While I've never seen ñ in my limited exposure to French, I don't find it impossible to believe that it occurs in some limited contexts, perhaps for place names along the border with Spain? The only questionable thing I believe I've done is to eliminate the OE ligatures and Y with diaeresis from the French list -- those aren't in Latin 1, and I wanted to permit Latin-1 fonts to be marked as supporting French. Note that none of this prohibits applications and users from explicitly selecting a font which is inappropriate for their current locale or document language -- explicit family names are now given greater weight than language matching when selecting fonts. Keith PackardXFree86 Core TeamHP Cambridge Research Lab ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts
[Fonts]Using current locale in font selection
Much as I hate the C locale model, I'm wondering if I shouldn't use the current locale as a language hint where applications don't provide explicit language information when selecting fonts. This would make the generic aliases (like sans-serif) pick a font appropriate for the locale instead of some random font most likely suitable for Latin languages. Or would this only lead to confusion and chaos? Keith PackardXFree86 Core TeamHP Cambridge Research Lab ___ Fonts mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/fonts