Re: Unicode and the Linux console (again)

Edward H. Trager Tue, 11 Jan 2005 08:42:04 -0800

On Tuesday 2005.01.11 06:06:30 -0500, Behdad Esfahbod wrote:
> 
> The point people miss is that:  We don't have a userspace
> terminal emulator that supports all Unicode scripts yet, not that
> the concept is well-defined at all, not that all scripts can be
> rendered in such a grid.  As soon as we have a clue about what it
> should look like (mlterm tries to do figure out), then we can
> think about if anybody wants to port/write all the code in kernel
> too or not.
> 
> behdad
>

Yes, maybe this is a good place to start: That is:

 (1) Start re-writing mlterm to support all the Unicode 4.1 scripts
     properly in a Userspace program

 (2) Simultaneously work on augmenting the GNU Unifont bitmap font
     so that it will have glyphs necessary for all the scripts.

NOTE THAT (2) REQUIRES a complete re-thinking of the GNU Unifont bitmap
font and format.  What I have been really thinking of doing for a long time
(but haven't had time to do yet) is to create a database and web page for
managing the required GNU Unifont glyphs and associated information.
I would probably just use PHP/MYSQL since that is a handy way to do it.
The problem is that the current GNU Unifont "database", if you will, only
has one "table" with just two "columns" right now:

    UNICODE_VALUE    BITMAP_DATA

This would have to be augmented to at least something like this:

    UNICODE_VALUE GLYPH_ID NO_OF_CONSOLE_CELLS BITMAP_DATA GLYPH_TYPE

... where:

* GLYPH_ID: Every glyph gets a GLYPH_ID, not every glyph gets a UNICODE_VALUE
because some glyphs represent ligatures (mandatory or optional), consonant
conjuncts, positionally-dependent forms, etc.

* NO_OF_CONSOLE_CELLS would tell you how many console cells are
required for, say, those Burmese glyphs (as well as of course for CJK).

* GLYPH_TYPE would be an enumeration of glyph types.  You need this for
positionally-dependent glyphs, glyphs for mandatory ligatures, etc. for
Semitic scripts (Arabic, Syriac) and Indic/Indic-derived scripts (Devanagari,
Burmese, etc. etc.).

But even this is not enough.  One also needs to have a glyph substitution
table (like GSUB in OpenType)  which would allow you to map a sequence of
UNICODE_VALUEs or GLYPH_IDs to the GLYPH_IDs representing mandatory ligatures,
consonant conjuncts in Indic scripts, and so on.

What I propose to do is:

  (1) Create the database and web-based management/development tool for the
      GNU Unifont project first.
  (2) Worry about what format to package the GNU Unifont in later.  

Having the GNU Unifont glyph data and glyph substitution tables in some
standard database format that people could access via the web, or download
in its entirety for processing, would make management and development easy.
For example, anyone with skills in Myanmar would be able to create a login
account on the website, do a query to see what bitmap glyphs for Myanmar had
been created so far, work on creating new glyphs via a web-based tool, etc.

(Also, note that it doesn't have to be limited to BITMAP data.   If, in the
future, someone wants to define STROKE-BASED "GNU UNIFONT", or something like
that, just add another column to the database for STROKED_GLYPH_DATA).  
STOKE_GLYPH_DATA could be SVG data, perhaps.  

Well, just more ideas for everyone's consumption ...

- Ed

> 
> On Tue, 11 Jan 2005, Martin Wiss wrote:
> 
> > > I think Edward states that this is his opinion. I would also love to see
> > > all of Unicode 4.1 supported
> > > on the Linux console. However it looks difficult to get someone with those
> > > two incompatible(?) skills, Linux kernel programming and love for
> > > linguistics...
> >
> > Quite often there are various people wanting to make linux available in
> > their own languages. There are various linux-user-groups (for example for
> > myanmar, khmer, indic scripts and so on...)
> > But it seems like they get lost. I think because of lack of organization and
> > cooperation.
> > It has to be a cooperative effort in order to get full support in one
> > implementation.
> > As we have stated, noone can possible know everything about all the scripts
> > in the world, as well as knowing kernel programming, and have the time to do
> > all this work.
> > So the the implementation of each script should preferably be done by native
> > speakers of the various languages. But the work has to be coordinated in
> > some way.
> >
> > Another question: What is the use of the concept of "four console character
> > cells", "double cell width" etc.. for scripts that have various cellwidths?
> > (like burmese) Shouldn�t one cell always be large enough to fit the char?
> > Isn�t it better to always put one character into one cell, and instead
> > increase or decrease the cell width? I guess that is what the concept of
> > cells are used for... I mean there is no use to have multiple cells for one
> > char. It is like selecting the left part of an "M" or the rigth part of an
> > "Z". Why would one like to do something like that?
> >
> > By the way, I think the burmese script is the most beautiful and elegant
> > script in the world. And it would be wonderful if it could be used in linux
> > terminals. Not only because of its eastethic features but I think also it
> > would be important for the connectivity and development in Myanmar. And
> > increased connectivity could have many other possitive effects for that
> > country for many reasons.
> >
> > Martin
> >
> >
> > --
> > Linux-UTF8:   i18n of Linux on all levels
> > Archive:      http://mail.nl.linux.org/linux-utf8/
> >
> >
> >
> 
> --behdad
> http://behdad.org/
> 
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
> 
> 
> 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode and the Linux console (again)

Reply via email to