On Mon, Jan 10, 2005 at 04:35:26PM -0500, Edward H. Trager wrote: > Hi, Simos, > > Some months ago I had had the idea > of trying to fill out the missing parts of the GNU Unifont bitmap font: When > one > looks at a script like Myanmar, it is not at all obvious how one should try > to "squish" > the various glyphs into one cell or two cells. Some characters, like MYANMAR > LETTER KA > u+1000 clearly look like they should take up two console character cells, > just like Han > chinese characters do. Others, like MYANMAR LETTER KHA u+1001 clearly need > only one > character cell. Other letters like MYANMAR LETTER II u+1024 ought to use up > *THREE CONSOLE > CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE > CHARACTER CELLS*. > Has anyone ever thought about this before? So, if you ask me, having the > option of > "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or > other diacritics > that combine with a previous character but don't take up any additional > console character > cells) is not enough. There has to be a system that would allow for zero, > one, two, three, > and four character cell widths. Maybe even more--I'd have to look more > carefully to know the answer. > One can envision a similar problem for other Indic and Indic-derived > scripts, like Devanagari.
Yes, this is handled in the ISO TR 14652 locales, that is largely implementet with glibc. The LC_CTYPE "width" keyword, and the "width" keyword in ISO TR 14652 charmaps are the places to define the width. The width may be larger than 2. The C function wcwidth() addresses this from the API side. I beleiev it is all inplemented in glibc. best regards keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
