[A12n-policy] BBC.co.uk languages - mostly not UTF-8

Don Osborn Sun, 12 Apr 2009 18:30:05 -0700

A quick review of coding on BBC World Service pages in diverse languages at
http://www.bbc.co.uk/worldservice/languages/  reveals . a diversity of
charset codes used, with most pages *not* in utf-8.  I suspect that BBC is
anticipating the kinds of systems that users in each language population
will rely on, trying to accommodate the least sophisticated systems and font
repertoires.  Assuming that their read is accurate (and that they're not
just being just conservative about making the change to utf-8), this would
seem to be an interesting window on how widespread the use of Unicode is or
is not at the present time.  On the other hand, it is worth noting that no
Latin-based orthography is displayed on bbc.co.uk in utf-8, even when
characters beyond Latin-1 are used (Turkish) or should be used (Hausa). If
one had the time, it would be interesting to look also at other
international radio sites - VOA, RFI, Deutsche Welle, Radio China, etc.


 

Among the questions I have are whether we can expect that all web content
(at least on high profile international sites) will eventually go to utf-8
or another Unicode rendering or will various non-Unicode 8-bit standards
continue to hold sway in selected areas for some time to come?  I think that
in the "ecology" of localization in a region such as West Africa, the use or
non-use of utf-8 by international websites for a language like Hausa (which
basically is the difference between being able to use the formal orthography
or resorting to an ASCIIfied transcription as they currently do) certainly
has an effect on the way that that language and others are used in text
offline. At what point does the argument that too many local systems in a
region do not have unicode fonts lose its validity, and at what point should
organizations like BBC take the leadership in use of utf-8 (as it did a
while back with a Unicode font for Urdu)?

 

BBC lists 32 languages, but two of them - Kinyarwanda and Kirundi - lead to
the same "Great Lakes" page (the two languages are interintelligible).  Also
for the sake of this list, I count Portuguese only once, even though BBC has
Brazilian and African varieties separate. Hence the total below comes to 30.

 

Albanian  charset=windows-1250

Arabic  charset=windows-1256

Azeri  charset=utf-8

Bangla  charset=utf-8

Burmese  charset=utf-8

Chinese  charset=gb2312

English (Caribbean)  charset=iso-8859-1

French  charset=iso-8859-1

Hausa  charset=iso-8859-1

Hindi  charset=utf-8

Indonesian  charset=iso-8859-1

Kinyarwanda (& Kirundi)  charset=iso-8859-1

Kyrgyz  charset=utf-8

Macedonian  charset=windows-1251

Nepali  charset=utf-8

Pashto  charset=utf-8

Persian  charset=utf-8

Portuguese  (both Brazilian and African)  charset=iso-8859-1

Russian  charset=windows-1251

Serbian  charset=windows-1250

Sinhala  charset=utf-8

Somali  charset=iso-8859-1

Spanish  charset=iso-8859-1

Swahili  charset=iso-8859-1

Tamil  charset=utf-8

Turkish  charset=charset=windows-1254

Ukranian  charset=windows-1251

Urdu  charset=utf-8

Uzbek  charset=utf-8

Vietnamese  charset=utf-8

 

Totals:

13 utf-8

9 iso-8859-1

3 windows-1251

2 windows-1250

1 windows-1254

1 windows-1256

1 gb2312

_______________________________________________
A12n-policy mailing list
A12n-policy@bisharat.net
http://lists.bisharat.net/mailman/listinfo/a12n-policy

[A12n-policy] BBC.co.uk languages - mostly not UTF-8

Reply via email to