On Monday 2005.01.10 22:36:48 +0100, Keld J�rn Simonsen wrote: > On Mon, Jan 10, 2005 at 04:35:26PM -0500, Edward H. Trager wrote: > > Hi, Simos, > > > > Some months ago I had had the idea > > of trying to fill out the missing parts of the GNU Unifont bitmap font: > > When one > > looks at a script like Myanmar, it is not at all obvious how one should try > > to "squish" > > the various glyphs into one cell or two cells. Some characters, like > > MYANMAR LETTER KA > > u+1000 clearly look like they should take up two console character cells, > > just like Han > > chinese characters do. Others, like MYANMAR LETTER KHA u+1001 clearly need > > only one > > character cell. Other letters like MYANMAR LETTER II u+1024 ought to use > > up *THREE CONSOLE > > CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE > > CHARACTER CELLS*. > > Has anyone ever thought about this before? So, if you ask me, having the > > option of > > "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or > > other diacritics > > that combine with a previous character but don't take up any additional > > console character > > cells) is not enough. There has to be a system that would allow for zero, > > one, two, three, > > and four character cell widths. Maybe even more--I'd have to look more > > carefully to know the answer. > > One can envision a similar problem for other Indic and Indic-derived > > scripts, like Devanagari. > > Yes, this is handled in the ISO TR 14652 locales, that is largely > implementet with glibc. The LC_CTYPE "width" keyword, and the "width" > keyword in ISO TR 14652 charmaps are the places to define the width. The > width may be larger than 2. The C function wcwidth() addresses this from > the API side. I beleiev it is all inplemented in glibc.
Really, is it accurate? (I had tried using wcwidth() in a program I wrote and found that it gave incorrect answers for certain scripts. I forget now which scripts they were --probably Arabic, Thai, and Devanagari were among them. Since I wanted to have my program work correctly even on BSD's like OpenBSD that do not yet have NLS support (I believe a Japanese group has a project called Citrus to provide NLS locale support for the BSDs, but I don't know at what stage it is), I ended up writing my own consoleStringWidth() function which works correctly with GNU Unifont and mlterm for at least the set of scripts that I had been testing. But even what I wrote is still incomplete since there is as yet no console support for scripts like Myanmar, Tibetan, Khmer, etc...) > > best regards > keld > > -- > Linux-UTF8: i18n of Linux on all levels > Archive: http://mail.nl.linux.org/linux-utf8/ > > > -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
