On Mon, Jan 10, 2005 at 04:35:26PM -0500, Edward H. Trager wrote:
> Hi, Simos,
> 
> Some months ago I had had the idea
> of trying to fill out the missing parts of the GNU Unifont bitmap font:  When 
> one 
> looks at a script like Myanmar, it is not at all obvious how one should try 
> to "squish"
> the various glyphs into one cell or two cells.  Some characters, like MYANMAR 
> LETTER KA
> u+1000 clearly look like they should take up two console character cells, 
> just like Han
> chinese characters do.  Others, like MYANMAR LETTER KHA u+1001 clearly need 
> only one
> character cell.  Other letters like MYANMAR LETTER II u+1024 ought to use up 
> *THREE CONSOLE
> CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE 
> CHARACTER CELLS*.
> Has anyone ever thought about this before?  So, if you ask me, having the 
> option of
> "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or 
> other diacritics
>  that combine with a previous character but don't take up any additional 
> console character
> cells) is not enough.  There has to be a system that would allow for zero, 
> one, two, three,
> and four character cell widths.  Maybe even more--I'd have to look more 
> carefully to know the answer.
>   One can envision a similar problem for other Indic and Indic-derived
> scripts, like Devanagari.

Yes, this is handled in the ISO TR 14652 locales, that is largely
implementet with glibc. The LC_CTYPE "width" keyword, and the "width"
keyword in ISO TR 14652 charmaps are the places to define the width. The
width may be larger than 2. The C function wcwidth() addresses this from
the API side. I beleiev it is all inplemented in glibc.

best regards
keld

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to