Re: Unicode and the Linux console (again)

Keld Jďż˝rn Simonsen Tue, 11 Jan 2005 08:52:46 -0800

On Tue, Jan 11, 2005 at 10:40:59AM -0500, Edward H. Trager wrote:
> On Monday 2005.01.10 22:36:48 +0100, Keld Jďż˝rn Simonsen wrote:
> > On Mon, Jan 10, 2005 at 04:35:26PM -0500, Edward H. Trager wrote:
> > > Hi, Simos,
> > > 
> > > Some months ago I had had the idea
> > > of trying to fill out the missing parts of the GNU Unifont bitmap font:  
> > > When one 
> > > looks at a script like Myanmar, it is not at all obvious how one should 
> > > try to "squish"
> > > the various glyphs into one cell or two cells.  Some characters, like 
> > > MYANMAR LETTER KA
> > > u+1000 clearly look like they should take up two console character cells, 
> > > just like Han
> > > chinese characters do.  Others, like MYANMAR LETTER KHA u+1001 clearly 
> > > need only one
> > > character cell.  Other letters like MYANMAR LETTER II u+1024 ought to use 
> > > up *THREE CONSOLE
> > > CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE 
> > > CHARACTER CELLS*.
> > > Has anyone ever thought about this before?  So, if you ask me, having the 
> > > option of
> > > "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or 
> > > other diacritics
> > >  that combine with a previous character but don't take up any additional 
> > > console character
> > > cells) is not enough.  There has to be a system that would allow for 
> > > zero, one, two, three,
> > > and four character cell widths.  Maybe even more--I'd have to look more 
> > > carefully to know the answer.
> > >   One can envision a similar problem for other Indic and Indic-derived
> > > scripts, like Devanagari.
> > 
> > Yes, this is handled in the ISO TR 14652 locales, that is largely
> > implementet with glibc. The LC_CTYPE "width" keyword, and the "width"
> > keyword in ISO TR 14652 charmaps are the places to define the width. The
> > width may be larger than 2. The C function wcwidth() addresses this from
> > the API side. I beleiev it is all inplemented in glibc.
> 
> Really, is it accurate?


I don't know if it is accurate. I think you talk of the data you would
obtain with the wcwidth() function in standard glibc. I am not sure of
the origin there. It may come from Markus Kuhn, it may be something I
made. There is also a question of what is accurate. A width for a 
character may be correct for one character set, but not correct for
another, e.g. a number of Latin and Greek letters are double width in
some eastern Asian character set, but not in European character sets.
That is why there is a "width" keyword for the charmap spec, to override
the general width.

best regards
keld

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode and the Linux console (again)

Reply via email to