I've decided that using RFC 3066 to indicate font language coverage is a 
good idea (or at least the best idea).  Owen Taylor is partially 
responsible as he's using it in Pango; the realization that HTML also 
uses this RFC for language tagging documents makes it pretty clear that I 
could do a lot worse.

RFC 3066 uses ISO 639 language codes and combines them with ISO 3166
country codes -- you're probably familiar with these as a part of locale
names (e.g. en-US)

My plan is to have fonts advertise the complete set of languages that they 
cover, and then to allow them to further distinguish languages with 
country codes as needed (zh-TW vs zh-CN).  

Now matching can take place using the language tags; a font supporting the
language for a different country will match "less strongly" than a font
matching the language for the correct country.  Both of these will match
more strongly than a font not supporting the language at all.  This has the
benefit of making traditional Chinese fonts preferred over Japanese fonts
for the display of simplified Chinese documents.

I think this will work better than the current hack using OS/2 
codePageRange bits.

Ok, so now I have a direction to run, but I'm missing a ton of data.

To generate language coverage for a font, I need to know what Unicode 
coverage is required for each language.  I don't want the coverage offered 
by fonts designed for the language; that's often far broader than the 
coverage needed to display text in the language.  All I want are the 
Unicode codepoints for the alphabet, abjad or logography, that way fonts 
are strictly selected based on language coverage and ignore spurious 
punctuation or foreign characters common in encodings.

My plan is to start with the 139 ISO 639-1 2 letter language codes and add
(as needed) 3 letter country codes from ISO 639-2.  That still leaves me
needing coverage information for 139 languages.  

I've managed to scrounge coverage information for most European languages
along with the non-european 8859 languages and the Han languages.  That's a
total of 61 languages covering a great deal of the world.  Big parts are
still missing; most of non-Arabic Africa, non-Han Asia and a smattering
of native American languages.

If any of you have a particular interest in one of the missing languages, 
please feel free to build an appropriate coverage table.  They're usually 
quite short and easy to generate as long as one has knowledge or a 
reference to the source language.  They should be as complete as possible; 
the goal is to avoid using fonts which are missing some common codepoints,
one such example is attempting to use an ISO Latin-1 font for Turkish; 
Latin-1 has all but two codepoints needed to display Turkish, making it 
nearly complete but also completely unsuitable.  Here's an example which
should make the format abundantly clear:

# Dutch (NL)
0040-005a
0060-007a
00c4
00cb
00cf
00d6
00dc
00e4
00eb
00ef
00f6
00fc
#0132-0133      # IJ and ij ligatures

I attach a listing of the ISO 639-1 language codes with a '*' marking the 
languages for which I have coverage information.  For those uninterested 
in the mechanics of generating the tables, please send references to 
places I can find coverage information for missing languages.  I'm
willing to take information in whatever format you have.

Keith Packard        XFree86 Core Team        HP Cambridge Research Lab

------

Lang    Done    Description

AA              Afar Djibouti, N Ethiopia Hamito-Semitic F., Cushitic Br.
AB      *       Abkhazian Abkhazia (Georgia) Caucasian F.
AF              Afrikaans South Africa, Namibia Indo-European F., Germanic Br. 10
AM              Amharic Ethiopia Hamito-Semitic F., Semitic Br. 20
AR      *       Arabic Middle East, N Africa Hamito-Semitic F., Semitic Br. 218
AS              Assamese Assam (India) Indo-European F., Indo-Iranian Br. 23
AY              Aymara Bolivia, Peru Andean-Equatorial F., Andean Br. 2
AZ      *       Azerbaijani Iran, Azerbaijan Uralo-Altaic F., Turkic Br. 15
BA      *       Bashkir Bashkir (S Urals, Russia) Uralo-Altaic F., Turkic Br. 1
BE      *       Byelorussian Byelorussia Indo-European F., Balto-Slavic Br. 10
BG      *       Bulgarian Bulgaria, Yugoslavia, Greece Indo-European F., Balto-Slavic 
Br. 9
BH              Bihari Bihar (India) Indo-European F., Indo-Iranian Br.
BI              Bislama Vanuatu, New Caledonia English based creole, Pacific
BN              Bengali, Bangla Bangladesh, West Bengal (India) Indo-European F., 
Indo-Iranian Br. 196
BO              Tibetan Tibet, Bhutan, Nepal, India Sino-Tibetan F., Tibeto-Burmese 
Br. 5 BO from Bodskad
BR      *       Breton Britanny (W France) Indo-European F., Celtic Br.
CA      *       Catalan Catalania (NE Spain), Balearic Islands, Sardinia, S France, 
Andorra, Argentina Indo-European F., Italic Br. 9
CO      *       Corsican Corsica (France) Indo-European F., Italic Br.
CS      *       Czech Czech Republic Indo-European F., Balto-Slavic Br. 11
CY              Welsh Wales (United Kingdom) Indo-European F., Celtic Br.
DA      *       Danish Denmark, Germany Indo-European F., Germanic Br. 5
DE      *       German Germany, Austria, Switzerland, U.S.A. Indo-European F., 
Germanic Br. 121 DE from Deutsch
DZ              Bhutani, Bhutanese Bhutan Sino-Tibetan F., Tibeto-Burmese Br.
EL      *       Greek Greece, Cyprus, Turkey Indo-European F., Hellenic Br. 12
EN      *       English North America, British Isles, Australia, New Zealand, South 
Africa Indo-European F., Germanic Br. 470
EO      *       Esperanto 2 Artificial language
ES      *       Spanish Spain, Latin America, U.S.A. Indo-European F., Italic Br. 381
ET      *       Estonian Estonia Uralo-Altaic F., Finno-Ugric Br. 1
EU      *       Basque W Pyrenees (France, Spain) (Isolate) EU from Euskera
FA              Persian Iran, Afghanistan Indo-European F., Indo-Iranian Br. 35 FA 
from Farsi
FI      *       Finnish, Suomi Finland, Russia, Sweden Uralo-Altaic F., Finno-Ugric 
Br. 6
FJ              Fiji, Fijian Fiji Austric F., Malayo-Polynesian Br.
FO      *       Faroese, Faeroese Faeroe Islands (Denmark) Indo-European F., Germanic 
Br.
FR      *       French France, Belgium, Canada, U.S.A., Switzerland Indo-European F., 
Italic Br. 124
FY      *       Frisian Frisian Islands (Netherlands-Germany) Indo-European F., 
Germanic Br.
GA      *       Irish Ireland Indo-European F., Celtic Br. GA from Gaeilge
GD      *       Scots Gaelic Scotland Indo-European F., Celtic Br.
GL      *       Galician Spanish Galicia Indo-European F., Italic Br. 4
GN              Guaran? Paraguay, Bolivia, S Brazil Andean-Equatorial F., Equatorial 
Br. 4
GU              Gujarati, Gujerati Gujarat (India), Bombay, Pakistan, South Africa 
Indo-European F., Indo-Iranian Br. 40
HA              Hausa N Nigeria, Niger, Cameroun Hamito-Semitic F., Chadic Br. 37
HE      *       Hebrew Israel Hamito-Semitic F., Semitic Br. 5 Formerly IW from 
Iwrith. See Note 4.
HI              Hindi India, Pakistan, Trinidad, Guyana, Fiji, Mauritius Indo-European 
F., Indo-Iranian Br. 418 Same as Urdu [UR] except for writing system. See Note 3.
HR      *       Croatian, Croat Croatia Indo-European F., Balto-Slavic Br. HR from 
Hrvatski. See Note 2.
HU      *       Hungarian, Magyar Hungary, Romania, Yugoslavia, Czechoslovakia 
Uralo-Altaic F., Finno-Ugric Br. 14
HY      *       Armenian Armenia, Middle East Indo-European F., Armenian Br. 5 HY from 
Hayeren
IA              Interlingua Artificial language
ID              Indonesian, Bahasa Indonesia Indonesia, Malaysia, Thailand, Singapore, 
Brunei Austric F., Malayo-Polynesian Br. Formerly IN. See Note 4.
IE              Interlingue Artificial language. Prototype of Interlingua [IA]
IK              Inupiak Greenland, N Canada, Alaska (U.S.A.) Eskimo-Aleut F.
IS      *       Icelandic Iceland Indo-European F., Germanic Br. IS from Islenzk
IT      *       Italian Italy, U.S.A., France, Argentina, Switzerland, Canada, Brazil 
Indo-European F., Italic Br. 62
IU              Inuktitut NE Canada Eskimo-Aleut F. See Note 5.
JA      *       Japanese, Nihongo Japan, Brazil, California (U.S.A.), Hawaii (U.S.A.) 
Japanese-Korean F. 126
JW              Javanese Java, Malaysia, Surinam Austric F., Malayo-Polynesian Br. 64 
JW from Bahasa Jawa
KA      *       Georgian Georgia Caucasian F. 3 KA from Kartuli
KK      *       Kazakh Kazakhstan, Sinkiang (China), Afghanistan Uralo-Altaic F., 
Turkic Br. 8
KL      *       Greenlandic Greenland Eskimo-Aleut F. KL from Kalaallisut
KM              Cambodian Cambodia, Thailand, Viet Nam Austric F., Austrio-Asiatic Br. 
9 KM from Khmer
KN              Kannada Karnatuka (India) Dravidian F. 44
KO      *       Korean, Choson-o South Korea, North Korea, NE China, Japan, Siberia, 
Hawaii (U.S.A.) Japanese-Korean F. 75
KS              Kashmiri Kashmir (India-Pakistan) Indo-European F., Indo-Iranian Br. 4
KU              Kurdish, Zimany Kurdy Turkey, Iran, Iraq, Syria Indo-European F., 
Indo-Iranian Br. 11
KY              Kirghiz Kirghiz, Sinkiang (China), Afghanistan Uralo-Altaic F., Turkic 
Br. 2 KY from Kyrgyz
LA      *       Latin Indo-European F., Italic Br. Ancient language nearing extinction
LN              Lingala, liNgala Zaire, Congo Niger-Kordofanian F., Non-Mande Br. 7
LO              Laothian, Pha Xa Lao, Lao Laos, Thailand Sino-Tibetan F., Sino-Siamese 
Br. 4
LT      *       Lithuanian Lithuania Indo-European F., Balto-Slavic Br. 3
LV      *       Latvian, Lettish Latvia Indo-European F., Balto-Slavic Br. 2
MG              Malagasy Madagascar Austric F., Malayo-Polynesian Br. 12
MI              Maori New Zealand Austric F., Malayo-Polynesian Br.
MK      *       Macedonian Macedonia, Bulgaria, Greece Indo-European F., Balto-Slavic 
Br. 2
ML              Malayalam Kerala (SW India) Dravidian F. 35
MN              Mongolian Mongolia Uralo-Altaic F., Mongolic Br.
MO      *       Moldavian
MR              Marathi, Mahrati Maharashtra (W India) Indo-European F., Indo-Iranian 
Br. 69
MS              Malay Malaysia, Indonesia Austric F., Malayo-Polynesian Br. 155 MS 
from Bahasa Malaysia
MT      *       Maltese Malta Hamito-Semitic F., Semitic Br.
MY              Burmese Burma, Bangladesh Sino-Tibetan F., Tibeto-Burmese Br. 30 MY 
from Myanmasa
NA              Nauru, Nauruan Nauru Austric F., Malayo-Polynesian Br.
NE              Nepali, Nepalese Nepal, Uttar Pradesh (India) Indo-European F., 
Indo-Iranian Br. 16
NL      *       Dutch Netherlands, Belgium Indo-European F., Germanic Br. 21 NL from 
Nederlands
NO      *       Norwegian Norway Indo-European F., Germanic Br. 5
OC      *       Occitan S France Indo-European F., Italic Br. 4
OM              (Afan) Oromo, Galla Ethiopia, Kenya Hamito-Semitic F., Cushitic Br. 10
OR              Oriya Orissa (E India) Indo-European F., Indo-Iranian Br. 31
PA              Punjabi Punjab (India), Pakistan Indo-European F., Indo-Iranian Br. 93 
PA from Panjabi
PL      *       Polish Poland, U.S.A. Indo-European F., Balto-Slavic Br. 44
PS              Pashto, Pushto, Pushtu Afghanistan, Pakistan Indo-European F., 
Indo-Iranian Br. 21
PT      *       Portuguese Brazil, Portugal, Spain, Uruguay, Argentina, Azores, Goa, 
Madeira Indo-European F., Italic Br. 182
QU              Quechua Peru, Ecuador, Bolivia Andean-Equatorial F., Andean Br. 8
RM      *       Rhaeto-Romance, Rhaeto-Romanic, Romansch S Switzerland, N Italy, Tyrol 
(Austria) Indo-European F., Italic Br.
RN              Kirundi, kiRundi Niger-Kordofanian F., Non-Mande Br.
RO      *       Romanian, Rumanian Rumania Indo-European F., Italic Br. 25
RU      *       Russian Russia, former USSR republics Indo-European F., Balto-Slavic 
Br. 288
RW              Kinyarwanda, kinyaRuanda Rwanda, Burundi, Uganda, Zaire, Tanzania 
Niger-Kordofanian F., Non-Mande Br. RW from Rwanda
SA              Sanskrit India Indo-European F., Indo-Iranian Br. Ancient language
SD              Sindhi Pakistan, Sind (India) Indo-European F., Indo-Iranian Br. 18
SG              Sangho, Sango-Ngbandi Central African Republic, Zaire 
Niger-Kordofanian F., Non-Mande Br. 4
SH      *       Serbo-Croatian Croatia Indo-European F., Balto-Slavic Br. 20 See Note 
2.
SI              Singhalese, Sinhalese Sri Lanka Indo-European F., Indo-Iranian Br. 13
SK      *       Slovak Slovakia Indo-European F., Balto-Slavic Br. 5
SL      *       Slovenian, Slovene Slovenia, Italy, Austria Indo-European F., 
Balto-Slavic Br. 2
SM              Samoan Samoa Austric F., Malayo-Polynesian Br.
SN              Shona, chiShona Rhodesia, Mozambique Niger-Kordofanian F., Non-Mande 
Br. 8
SO              Somali Somalia, Ethiopia, Kenya Hamito-Semitic F., Cushitic Br. 5
SQ      *       Albanian Albania, Kosovo (Yugoslavia), Italy, Greece Indo-European F., 
Albanian Br. 5 SQ from Shqip
SR      *       Serbian Serbia Indo-European F., Balto-Slavic Br. SR from Srpski. See 
Note 2.
SS              Siswati, siSwati South Africa, Rhodesia, Swaziland Niger-Kordofanian 
F., Non-Mande Br.
ST              Sesotho, siSuthu South Africa, Lesotho, Botswana Niger-Kordofanian F., 
Non-Mande Br.
SU              Sundanese West Java Austric F., Malayo-Polynesian Br. 26
SV      *       Swedish Sweden, Finland Indo-European F., Germanic Br. 9 SV from 
Svenska
SW              Swahili, kiSwahili Tanzania, Comoro Islands, Kenya, Mozambique, Zaire 
Niger-Kordofanian F., Non-Mande Br. 48
TA              Tamil Tamil Nadu (S India), Sri Lanka, Malaysia, Singapore Dravidian 
F. 71
TE              Telugu, Telegu Andhra Pradesh (India) Dravidian F. 73
TG              Tajik, Tajiki Tadzhikstan Indo-European F., Indo-Iranian Br. 5
TH      *       Thai Thailand 50
TI              Tigrinya N Ethiopia Hamito-Semitic F., Semitic Br. 4
TK              Turkmen, Turkoman, Turcoman Turkmenistan, Iran, Afghanistan 
Uralo-Altaic F., Turkic Br. 3
TL              Tagalog Philippines Austric F., Malayo-Polynesian Br. 54
TN              Setswana South Africa
TO              Tonga Niger-Kordofanian F., Non-Mande Br. 7
TR      *       Turkish Turkey, Bulgaria, Yugoslavia, Cyprus, Greece Uralo-Altaic F., 
Turkic Br. 59
TS              Tsonga 3
TT              Tatar, Tartar Tatarstan Uralo-Altaic F., Turkic Br. 8
TW              Twi, Tshi W Africa Niger-Kordofanian F., Non-Mande Br.
UG              Uigur, Uighur, Uyghur Sinkiang (China), Kazakhstan, Uzbekistan, 
Afghanistan Uralo-Altaic F., Turkic Br. 8 See Note 5.
UK      *       Ukrainian Ukraine, Canada, U.S.A. Indo-European F., Balto-Slavic Br. 47
UR              Urdu Pakistan, India Indo-European F., Indo-Iranian Br. 102 Same as 
Hindi [HI] except for writing system. See Note 3.
UZ              Uzbek, Uzbeg, Usbek, Usbeg Uzbekstan, Tadzhikstan, Afghanistan 
Uralo-Altaic F., Turkic Br. 14
VI              Vietnamese Viet Nam, Thailand, Cambodia, Laos, New Caledonia, France, 
Dakar Sino-Tibetan F., Sino-Siamese Br. 65
VO              Volap?k Artificial language
WO              Wolof Senegal, Gambia Niger-Kordofanian F., Non-Mande Br. 7
XH              Xhosa, Xosa, isiXhosa South Africa, Rhodesia, Swaziland 
Niger-Kordofanian F., Non-Mande Br. 8
YI      *       Yiddish U.S.A., Israel, former USSR, Latin America, Canada, E Europe 
Indo-European F., Germanic Br. Formerly JI from Jiddisch. See Note 4.
YO              Yoruba Western, Lagos and Kwara States (Nigeria), Benin 
Niger-Kordofanian F., Non-Mande Br. 20
ZA              Zhuang, Chwang, Chuang China 15 See Note 5.
ZH      *       Chinese China Sino-Tibetan F., Sino-Siamese Br. 1,200 ZH from 
Zhongwen. See Note 1.
ZU              Zulu, isiZulu South Africa, Rhodesia, Swaziland Niger-Kordofanian F., 
Non-Mande Br. 9


_______________________________________________
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts

Reply via email to