One offshoot of the whole i18n/Pango discussion recently is that it finally 
dawned on me just how powerful our *existing* Unicode support in 1.0 already 
is -- without BiDi or Pango. 

Provided that users can locate appropriate fonts, that is.  

It might be helpful to segregate the languages we support into the following 
broad categories:

  1. easy
  2. easy, with the right font
  3. bidi
  4. complex shaping required (including combining characters)

As the World.abw test document demonstrates, there are a *lot* of languages 
which fall into the first two categories.  

the "just fonts" languages
--------------------------
Not only are there thirty-some Latin-1 languages which definitely fall into 
the first category (most fonts support them), but some of the small, 
general-purpose Unicode fonts being deployed add "just enough" glyphs to 
support an even broader range of languages.  

  http://www.abisource.com/mailinglists/abiword-dev/02/Apr/1036.html

Indeed, after doing some more digging, we can support content in many more 
languages by just locating a font that includes enough glyphs in the 
appropriate Unicode range.  

  http://www.alanwood.net/unicode/fonts.html

For example, the government of Nunavut has recently created Unicode fonts 
for Inuktitut:  

  http://www.assembly.nu.ca/unicode/fonts/
  http://www.assembly.nu.ca/unicode/fonts/beginner.html

I can't read them, of course, but they sure look pretty.  :-)

the "harder" languages
----------------------
Of course, there *are* languages for which we'll need more than just fonts.  
For example, Tomas has hand-coded a lot of support for bidi languages, a 
category which includes:

  ar, fa, he, ur, yi

Now we're investigating Pango since, in addition to BiDi support, it should 
(eventually) encapsulate knowledge about the more complex typographic needs 
of languages which don't have discrete Unicode codepoints for all of the 
glyphs needed.  Andrew keeps mentioning Vietnamese (vi-VN), and I know that 
other South Asian languages need this, but how extensive is the rest of this 
category?

the question
------------
OK, i18n experts ... is this a useful, clean distinction?  If not, please 
let me know what I've garbled here. 

bottom line
-----------
I'm thrilled that we've got dedicated folks working on solving the "harder" 
language problems.  However, I'd love to see some folks do more research on 
improving our support for "just fonts" languages as follows:

  - come up with a complete list of such languages
  - come up with a list of the fonts needed to support each of them

Note that this is essentially a web research task, not a coding task.  The 
ultimate goal would be to learn enough so that we could write a quick 
website entry for each language, telling users:

  - who's responsible for the translation
  - where to find dictionaries (if any)
  - where to find fonts
  - etc. 

For example, two sample entries might be

  Indonesian (id-ID)
  ------------------
  translators:  Tim Allen, ...
  dictionary:   (n/a)
  fonts:        ...
  sample:       (the UTF-8 gobbledygook from World.abw)
  picture:      (screenshot of the same)

  Inuktitut (iu-CA)
  -----------------
  translators:  (n/a)
  dictionary:   (n/a)
  fonts:        http://www.assembly.nu.ca/unicode/fonts/
  sample:       (the UTF-8 gobbledygook from World.abw)
  picture:      (screenshot of the same)

Best of all, this could increase our language support for the 1.0.* series 
of products, while waiting for all the hard coding work to get done for the 
set of other languages which actually *do* need BiDi and/or Pango.  

Does this sound interesting?  Is anyone interested in coordinating such an 
effort?  It seems like a large task to write up as a uPOW.  

Paul

Reply via email to