Re: Unicode and the Linux console (again)

Edward H. Trager Tue, 11 Jan 2005 07:17:10 -0800

On Monday 2005.01.10 22:36:48 +0100, Keld J�rn Simonsen wrote:
> On Mon, Jan 10, 2005 at 04:35:26PM -0500, Edward H. Trager wrote:
> > Hi, Simos,
> > 
> > Some months ago I had had the idea
> > of trying to fill out the missing parts of the GNU Unifont bitmap font:  
> > When one 
> > looks at a script like Myanmar, it is not at all obvious how one should try 
> > to "squish"
> > the various glyphs into one cell or two cells.  Some characters, like 
> > MYANMAR LETTER KA
> > u+1000 clearly look like they should take up two console character cells, 
> > just like Han
> > chinese characters do.  Others, like MYANMAR LETTER KHA u+1001 clearly need 
> > only one
> > character cell.  Other letters like MYANMAR LETTER II u+1024 ought to use 
> > up *THREE CONSOLE
> > CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE 
> > CHARACTER CELLS*.
> > Has anyone ever thought about this before?  So, if you ask me, having the 
> > option of
> > "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or 
> > other diacritics
> >  that combine with a previous character but don't take up any additional 
> > console character
> > cells) is not enough.  There has to be a system that would allow for zero, 
> > one, two, three,
> > and four character cell widths.  Maybe even more--I'd have to look more 
> > carefully to know the answer.
> >   One can envision a similar problem for other Indic and Indic-derived
> > scripts, like Devanagari.
> 
> Yes, this is handled in the ISO TR 14652 locales, that is largely
> implementet with glibc. The LC_CTYPE "width" keyword, and the "width"
> keyword in ISO TR 14652 charmaps are the places to define the width. The
> width may be larger than 2. The C function wcwidth() addresses this from
> the API side. I beleiev it is all inplemented in glibc.


Really, is it accurate? 

(I had tried using wcwidth() in a program I wrote and
found that it gave incorrect answers for certain scripts.  I forget now which 
scripts
they were --probably Arabic, Thai, and Devanagari were among them.  Since I 
wanted to
have my program work correctly even on BSD's like OpenBSD that do not yet have 
NLS 
support (I believe a Japanese group has a project called Citrus to provide NLS
locale support for the BSDs, but I don't know at what stage it is), I ended up 
writing
my own consoleStringWidth() function which works correctly with GNU Unifont and 
mlterm
for at least the set of scripts that I had been testing.  But even what I wrote 
is still
incomplete since there is as yet no console support for scripts like Myanmar, 
Tibetan,
Khmer, etc...)
 
> 
> best regards
> keld
> 
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
> 
> 
> 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode and the Linux console (again)

Reply via email to