On Fri, Aug 18, 2006 at 03:39:17AM -0700, rajeev joseph sebastian wrote: > > > Hello Rich Felker, > > ---- start quote ---- > 1. Does any existing character cell application (terminal emulator) > both display correctly-rendered Indic text and conform to WI1, i.e. > does it update column position according to wcwidth() and not the > OpenType-rendered width of the text string? I suspect not. RTFS'ing > mlterm it seems like it does not. I can't find any good info on > ncst-term. > > 2. Are there serious limitations of WI2 that make it impossible to > display [legibly] certain consonant clusters? Can the ZWJ/ZWNJ > semantics be satisfied correctly? > > 3. Other comments? > ---- end quote ---- > > I have a question on this. By "single width", "double width", do you > mean a global width constant, or a width that can be specified by > the font ?
Width specified by font is simply not possible, regardless of how nice it would look or how bad the alternatives would look. The most complex program that will work correctly with such a system is "cat". Anything more complex, be it a tabular message list in mutt, the text you're editing in a text editor or single-line entry line, etc. will corrupt the display horribly as soon as the presentation width disagrees with the logical wcwidth width. As bad as too much or too little spacing looks, having the whole terminal corrupt and leave 'droppings' all over the place when you move the cursor looks much worse... There is the possibility within POSIX to use the wcswidth function instead of wcwidth, which in theory could accommodate context-sensitive widths. Whether this is considered conformant I don't know, but I do know that presently few apps support this and that most apps would require significant rewrites to do so and major additional complexity. My proposed WI2 was to treat consonant clusters, rather than individual consonants, as the element with a fixed width and assign them the width of 2 (same as CJK ideographs and Hangul Jamo, the latter of which seems to be the well-handled script with the most in common with Indic consonant clusters). I'm fairly ignorant about nice Indic typesetting, but my casual observations found all the common clusters I could find fitting reasonably into a double-width cell. On the other hand I'm worried that the "-2 width" for the virama would confuse applications hopelessly, and that isolated dead letters would have the wrong width. Since you seem to be familiar with the matter, perhaps you could comment on whether displaying text in fixed one-cell-per-character form without width-alterring ligatures is considered acceptable. My impression is that it would be mostly acceptable in Devanagari except for the behavior of "ra", but might be significantly worse in other scripts (Kannada?) which seem to make more use of vertical combining. > Either way, Indic texts on a console would look really bad and be > practically unusable if glyphs had to be put into a specified width: > there would be too much spacing. Indic texts by their nature are > most suited to variable-widths. As far as I can tell they're presently unusable. I'm just trying to find a way to make them usable and hopefully not make them ugly in the process. If there are any working implementations already (in your opinion) I'd be happy to hear about how they work. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
