[Fonts]Adding language information for TrueType fonts

2002-07-07 Thread Keith Packard


Many TrueType fonts include an OS/2 table which holds codePageRange bits.  
These bits indicate the old OS/2 code pages supported by the font, and 
hence indirectly indicate which languages the font is intended to support.

These tables, however, are quite primitive, indicating support for only a 
very few languages as they hold only 64 bits total.

My question is whether I should take these TrueType fonts and test them 
against my new coverage tables, at least for languages which aren't 
covered by the codePageRange bits.

I now have coverage information for 76 of the 139 ISO 639-1 language 
names; I used the Unicode code charts to mark coverage for the Indic 
languages and a few other scripts:

Bengali (BN)
Tibetan (BO)
Gujarati (GU)
Khmer (KM)
Kannada (KN)
Lao (LO)
Malayalam (ML)
Mongolian (MN)
Oriya (OR)
Sinhala (Sinhalese) (SI)
Tamil (TA)
Telugu (GE)
Tagalog (TL)

Given that these languages have unique alphabets, this method seems 
relatively sound.  I'm still missing several Indic languages and
all of the non-arabic African languages.

I did remove the @ and ` marks from the latin scripts; that should leave 
all of them including only the alphabet.

I've also committed this whole mess to XFree86 CVS; the coverage 
files can be found in xc/lib/fontconfig/fc-lang/*.orth

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]Re: [I18n]language tags in fontconfig

2002-07-07 Thread Pablo Saratxaga

Kaixo!

On Sat, Jul 06, 2002 at 03:33:40AM -0700, Keith Packard wrote:
 
 I don't know why all of the latin languages include  and ', it's 
 probably just a mistake; they're easily removed.

For the '' I agree; but the apostrophe may be very important for
some languages (eg: French, English)

 The reason I haven't included the Euro is that this would disable the use
 of any Latin-1 fonts.

Also, monetary symbols could be taken from another font without too much
problem; and they are also quite irrelevant ot language (You can very well
put an amount in euros in a Chinese text, and an ammont in dollars in
an italian text...)

 I'm also uncomfortable about dropping requirements for numerals;
 they are more like letters than punctuation.
 
 The question is whether you'd want to skip a font just because it didn't 
 support the Basic Latin digits.  Applications that I'm writing now (Pango, 
 Mozilla and Tcl/Tk) will failover to another font for missing glyphs.

I think for latin based languages the numerals should always be there
(as well as the basic ascii set).
But for non-latin languages, the whoile ascii set (including the numerals)
may be missing from the font; so, for those non-latin languages, the
presence of the numerals can be skipped.

 I will note that my current Arabic table is missing the Arabic numerals,
 that seems wrong to me.

In fact the practice to use western-arabic digits, eastern-arabic digits,
or ascii-style digits vary from country to country; maybe even depending
on the context (eg: inside a text using arabic shapes, but a document
mostly numeric, like a spreadsheet using ascii-style ones)
 
-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://chanae.stben.be/pablo/   PGP Key available, key ID: 0xD9B85466
[you can write me in Walloon, Spanish, French, English, Italian or Portuguese]



msg00920/pgp0.pgp
Description: PGP signature


[Fonts]fcfreetype.c

2002-07-07 Thread Yu Shao

Hi Keith,

It seems a typo and I think using FcCodePageSet is always safer?

Shao

diff -uNr fcfreetype.c.orig fcfreetype.c
--- fcfreetype.c.origSun Jul  7 22:25:37 2002
+++ fcfreetype.cSun Jul  7 22:27:49 2002
@@ -365,7 +365,7 @@
 if (matchCodePage[i])
 {
 if (!FcPatternAddString (pat, FC_LANG,
- FcCodePageRange[i].name))
+ FcCodePageSet[i].name))
 goto bail1;
 hasLang = TRUE;
 }




___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts]fcfreetype.c

2002-07-07 Thread Keith Packard


Around 22 o'clock on Jul 7, Yu Shao wrote:

 It seems a typo and I think using FcCodePageSet is always safer?

Good catch, there was a typo, but that code has since been deleted in 
favor of the new RFC 3066-based language detection.

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Re: [I18n]Unicode coverage for languages

2002-07-07 Thread Keith Packard


Around 23 o'clock on Jul 7, Roger So wrote:

 Certainly; but have you considered the case that zh-HK and zh-MO users
 prefer zh-TW fonts over zh-CN fonts, and vice versa for zh-SG? (What
 other Chinese-speaking regions are there... perhaps zh-MY?)

Yes, each language-country pair may specify it's own orthography.
zh-HK and zh-MO could use the zh-TW set.

 To complicate matters, zh-HK uses traditional Chinese, but with more
 characters than usually is with zh-TW. (Big5 vs Big5 HKSCS)

That's fine; zh-HK would use a separate orthography that included the 
additional glyphs.

 And of course, many fonts from China now cover most characters defined
 in GB18030, which means if using coverage tables, these fonts will
 appear to support both zh-CN and zh-TW...

Yes, GB18030 makes this harder -- my GB18030 fonts cover all of Big5 
making it essentially impossible to distinguish by code coverage.  
Fortunately, all of the GB18030 fonts that I've seen are in TrueType 
format and include the appropriate OS/2 codePageRange bits which indicate 
design intent.

 Otherwise, I think using RFC-3066 is a good idea. I've only considered
 Chinese here as I'm a native Chinese speaker; and I don't think these
 problems crop up in other languages.

Han unification produces it's own issues here which can best be resolved 
by having fonts specify their target languages.  I suspect the best plan 
may well be to use Unicode coverage for language inclusion and then 
exclude certain Han languages based on the codePageRange bits.

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts] [I18n] language tags in fontconfig

2002-07-07 Thread Dr Andrew C Aitchison


Keith Packard wrote:
 I got the European coverage information from
 
 http://www.everytype.com/alphabets

I can't find www.everytype.com in the DNS, is that a typo ?

I'm curious because I can't understand the differences between
xc/lib/fontconfig/fc-lang/en.orth and
xc/lib/fontconfig/fc-lang/fr.orth 

In particular I remember 00e1 (a acute/Ã)¡ but not00f1 (n tilda/Ã)
from my French lessons.

-- 
Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge
[EMAIL PROTECTED]   http://www.dpmms.cam.ac.uk/~werdna

___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Re: [I18n]Unicode coverage for languages

2002-07-07 Thread Roger So

On Sat, 2002-07-06 at 13:34, Keith Packard wrote:
 My plan is to have fonts advertise the complete set of languages that they 
 cover, and then to allow them to further distinguish languages with 
 country codes as needed (zh-TW vs zh-CN).  
 
 Now matching can take place using the language tags; a font supporting the
 language for a different country will match less strongly than a font
 matching the language for the correct country.  Both of these will match
 more strongly than a font not supporting the language at all.  This has the
 benefit of making traditional Chinese fonts preferred over Japanese fonts
 for the display of simplified Chinese documents.
 
 I think this will work better than the current hack using OS/2 
 codePageRange bits.

Certainly; but have you considered the case that zh-HK and zh-MO users
prefer zh-TW fonts over zh-CN fonts, and vice versa for zh-SG? (What
other Chinese-speaking regions are there... perhaps zh-MY?)

To complicate matters, zh-HK uses traditional Chinese, but with more
characters than usually is with zh-TW. (Big5 vs Big5 HKSCS)

And of course, many fonts from China now cover most characters defined
in GB18030, which means if using coverage tables, these fonts will
appear to support both zh-CN and zh-TW...

Otherwise, I think using RFC-3066 is a good idea. I've only considered
Chinese here as I'm a native Chinese speaker; and I don't think these
problems crop up in other languages.

-- 
  Roger So Debian Developer
  Sun Wah Linux Limitedi18n/L10n Project Leader
  Tel: +852 2250 0230  [EMAIL PROTECTED]
  Fax: +852 2259 9112  http://www.sw-linux.com/
___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



Re: [Fonts] [I18n] language tags in fontconfig

2002-07-07 Thread Keith Packard


Around 10 o'clock on Jul 7, Dr Andrew C Aitchison wrote:

 I'm curious because I can't understand the differences between
 xc/lib/fontconfig/fc-lang/en.orth and
 xc/lib/fontconfig/fc-lang/fr.orth 
 
 In particular I remember 00e1 (a acute/Ã)¡ but not00f1 (n tilda/Ã)
 from my French lessons.

(Are you sending text in UTF-8?)

The orthographies I built were taken from a source which attempted to 
include every letter needed to write a particular language, even those 
which might be only infrequently used.  For english, we have words
like:

rôle, à la king, naïve

While the ascii-ification of english is pervasive, my Websters New World 
Dictionary (not known for it's inclusiveness in general) still lists these
spellings as native.

While I've never seen ñ in my limited exposure to French, I don't find it 
impossible to believe that it occurs in some limited contexts, perhaps for 
place names along the border with Spain?

The only questionable thing I believe I've done is to eliminate the OE 
ligatures and Y with diaeresis from the French list -- those aren't in 
Latin 1, and I wanted to permit Latin-1 fonts to be marked as supporting 
French.

Note that none of this prohibits applications and users from explicitly 
selecting a font which is inappropriate for their current locale or 
document language -- explicit family names are now given greater weight 
than language matching when selecting fonts.

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts



[Fonts]Using current locale in font selection

2002-07-07 Thread Keith Packard


Much as I hate the C locale model, I'm wondering if I shouldn't use the 
current locale as a language hint where applications don't provide 
explicit language information when selecting fonts.  This would make
the generic aliases (like sans-serif) pick a font appropriate for the 
locale instead of some random font most likely suitable for Latin 
languages.

Or would this only lead to confusion and chaos?

Keith PackardXFree86 Core TeamHP Cambridge Research Lab


___
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts