On Tue, Jan 11, 2005 at 10:40:59AM -0500, Edward H. Trager wrote: > On Monday 2005.01.10 22:36:48 +0100, Keld J�rn Simonsen wrote: > > On Mon, Jan 10, 2005 at 04:35:26PM -0500, Edward H. Trager wrote: > > > Hi, Simos, > > > > > > Some months ago I had had the idea > > > of trying to fill out the missing parts of the GNU Unifont bitmap font: > > > When one > > > looks at a script like Myanmar, it is not at all obvious how one should > > > try to "squish" > > > the various glyphs into one cell or two cells. Some characters, like > > > MYANMAR LETTER KA > > > u+1000 clearly look like they should take up two console character cells, > > > just like Han > > > chinese characters do. Others, like MYANMAR LETTER KHA u+1001 clearly > > > need only one > > > character cell. Other letters like MYANMAR LETTER II u+1024 ought to use > > > up *THREE CONSOLE > > > CHARACTER CELLS* and MYANMAR LETTER AU u+102A should have *FOUR CONSOLE > > > CHARACTER CELLS*. > > > Has anyone ever thought about this before? So, if you ask me, having the > > > option of > > > "single width" vs. "double width" vs. "zero-width" (i.e., accent marks or > > > other diacritics > > > that combine with a previous character but don't take up any additional > > > console character > > > cells) is not enough. There has to be a system that would allow for > > > zero, one, two, three, > > > and four character cell widths. Maybe even more--I'd have to look more > > > carefully to know the answer. > > > One can envision a similar problem for other Indic and Indic-derived > > > scripts, like Devanagari. > > > > Yes, this is handled in the ISO TR 14652 locales, that is largely > > implementet with glibc. The LC_CTYPE "width" keyword, and the "width" > > keyword in ISO TR 14652 charmaps are the places to define the width. The > > width may be larger than 2. The C function wcwidth() addresses this from > > the API side. I beleiev it is all inplemented in glibc. > > Really, is it accurate?
I don't know if it is accurate. I think you talk of the data you would obtain with the wcwidth() function in standard glibc. I am not sure of the origin there. It may come from Markus Kuhn, it may be something I made. There is also a question of what is accurate. A width for a character may be correct for one character set, but not correct for another, e.g. a number of Latin and Greek letters are double width in some eastern Asian character set, but not in European character sets. That is why there is a "width" keyword for the charmap spec, to override the general width. best regards keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
