Re: Comments on locale name guideline: CODESET names

Roozbeh Pournader Wed, 20 Jun 2001 13:13:44 -0700


On Wed, 20 Jun 2001, Markus Kuhn wrote:

> > Arabic (language) people rarely use visual charsets.
> 
> So what exactly are Arabic users of POSIX systems using in file names,
> source code comments, email, web pages, etc.? What are the current
> setups and how many people do you think use them? Is there any widely
> used Arabic encoding that should be supported in locales in addition to
> UTF-8? What Linux terminal emulator and fonts are currently used for
> these Arabic locale definitions?

In source code comments, they use English. In email, web pages, and file
names, they use Logical Arabic, which is displayed using special software
(like Arabic mail readers and editors), or terminal emulators.

One example of these terminal emulators are "Acon", an Arabic console
modifier for Linux which does bidi and joining on the Linux console, by
taking the character array and cursor position and transforming it. Acon
is/was shipping with Mandrake at least since 7.1. Acon uses it's own 8-bit
fonts and allows font and joining rules customization without the need for
recompiling the source. It handles Arabic combining marks by displaying
them after (to the left of) base letter over a space or Tatweel (based on
context), which is the tradition of displaying those in MS-DOS systems. I
am not good at statistics, but I'm sure that there are more than 100 users
of Acon. (Pablo may be able to give more details about Acon and its user
base.) Acon supports ISO-8859-6 and WINDOWS-1256.

There is Akka, based on Acon idea, which opens a hidden terminal, and
redirects bidi-ed and contextually joined text to the real terminal. I
don't know about the user base much, since it is a new thing.

And finally there are Arabic MS-DOS boxes under Arabic Win32 systems that
are implicit Arabic terminals. Arabic speakers use this to telnet to a
POSIX system and work with those. There are also physical Arabic
termnials, based on the ECMA bidi termnial spec, as I've heard, but I have
not ever seen any of those myself. There are also some solutions available
from Langbox, but I also have not tried those.

In the browser world, lynx has support for ISO-8859-6 and WINDOWS-1256
logical "output". The output is directed to an Arabic terminal emulator
like those above.

> If there isn't a currently widely used Arabic terminal emulator (like
> kterm for the CJK community, which is very widely used), then the answer
> is probably that the Arabic script is not really widely used on Linux at
> the moment, and we can start supporting it from scratch properly with
> UTF-8 (see Robert Brady's work on Arabic xterm).

Arabic (language) people hardly like anything other that WINDOWS-1256 or
ISO-8859-6. Perhaps they are the most faithful opponents of UCS after some
Japanese. Even Microsoft has problems moving them from 1256 to Unicode.
We cannot switch them to UTF-8 this easily.

There are Persian charsets (mostly visual) with much more than 50 users
using them as the default on their POSIX system (mostly on SCO Unix and
Solaris). But they are not the majority, as there are at least 10 major
Persian codepages, no single one holding more than 30% of all Persian
documents. Please note that while many of these are documented, I have not
tried registering any of them with IANA, in favor of UTF-8. But the need
for Arabic (language) charsets is really serious. WINDOWS-1256 is used on
more than 80% of Arabic web pages, and Arabic speakers think of files
encoded in that when they think about Arabic text files.

To come to a conclusion, ISO-8859-6 and WINDOWS-1256 are a most to help
Arabic speakers to migrate to Free Unices.

roozbeh

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: Comments on locale name guideline: CODESET names

Reply via email to