One offshoot of the whole i18n/Pango discussion recently is that it finally dawned on me just how powerful our *existing* Unicode support in 1.0 already is -- without BiDi or Pango.
Provided that users can locate appropriate fonts, that is. It might be helpful to segregate the languages we support into the following broad categories: 1. easy 2. easy, with the right font 3. bidi 4. complex shaping required (including combining characters) As the World.abw test document demonstrates, there are a *lot* of languages which fall into the first two categories. the "just fonts" languages -------------------------- Not only are there thirty-some Latin-1 languages which definitely fall into the first category (most fonts support them), but some of the small, general-purpose Unicode fonts being deployed add "just enough" glyphs to support an even broader range of languages. http://www.abisource.com/mailinglists/abiword-dev/02/Apr/1036.html Indeed, after doing some more digging, we can support content in many more languages by just locating a font that includes enough glyphs in the appropriate Unicode range. http://www.alanwood.net/unicode/fonts.html For example, the government of Nunavut has recently created Unicode fonts for Inuktitut: http://www.assembly.nu.ca/unicode/fonts/ http://www.assembly.nu.ca/unicode/fonts/beginner.html I can't read them, of course, but they sure look pretty. :-) the "harder" languages ---------------------- Of course, there *are* languages for which we'll need more than just fonts. For example, Tomas has hand-coded a lot of support for bidi languages, a category which includes: ar, fa, he, ur, yi Now we're investigating Pango since, in addition to BiDi support, it should (eventually) encapsulate knowledge about the more complex typographic needs of languages which don't have discrete Unicode codepoints for all of the glyphs needed. Andrew keeps mentioning Vietnamese (vi-VN), and I know that other South Asian languages need this, but how extensive is the rest of this category? the question ------------ OK, i18n experts ... is this a useful, clean distinction? If not, please let me know what I've garbled here. bottom line ----------- I'm thrilled that we've got dedicated folks working on solving the "harder" language problems. However, I'd love to see some folks do more research on improving our support for "just fonts" languages as follows: - come up with a complete list of such languages - come up with a list of the fonts needed to support each of them Note that this is essentially a web research task, not a coding task. The ultimate goal would be to learn enough so that we could write a quick website entry for each language, telling users: - who's responsible for the translation - where to find dictionaries (if any) - where to find fonts - etc. For example, two sample entries might be Indonesian (id-ID) ------------------ translators: Tim Allen, ... dictionary: (n/a) fonts: ... sample: (the UTF-8 gobbledygook from World.abw) picture: (screenshot of the same) Inuktitut (iu-CA) ----------------- translators: (n/a) dictionary: (n/a) fonts: http://www.assembly.nu.ca/unicode/fonts/ sample: (the UTF-8 gobbledygook from World.abw) picture: (screenshot of the same) Best of all, this could increase our language support for the 1.0.* series of products, while waiting for all the hard coding work to get done for the set of other languages which actually *do* need BiDi and/or Pango. Does this sound interesting? Is anyone interested in coordinating such an effort? It seems like a large task to write up as a uPOW. Paul
