Re: Indic scripts and wcwidth: comments?

Rich Felker Fri, 18 Aug 2006 10:26:05 -0700

On Fri, Aug 18, 2006 at 03:39:17AM -0700, rajeev joseph sebastian wrote:
> 
> 
> Hello Rich Felker,
> 
> ---- start quote ----
> 1. Does any existing character cell application (terminal emulator)
>    both display correctly-rendered Indic text and conform to WI1, i.e.
>    does it update column position according to wcwidth() and not the
>    OpenType-rendered width of the text string? I suspect not. RTFS'ing
>    mlterm it seems like it does not. I can't find any good info on
>    ncst-term.
> 
> 2. Are there serious limitations of WI2 that make it impossible to
>    display [legibly] certain consonant clusters? Can the ZWJ/ZWNJ
>    semantics be satisfied correctly?
> 
> 3. Other comments?
> ---- end quote ----
> 
> I have a question on this. By "single width", "double width", do you
> mean a global width constant, or a width that can be specified by
> the font ?


Width specified by font is simply not possible, regardless of how nice
it would look or how bad the alternatives would look. The most complex
program that will work correctly with such a system is "cat". Anything
more complex, be it a tabular message list in mutt, the text you're
editing in a text editor or single-line entry line, etc. will corrupt
the display horribly as soon as the presentation width disagrees with
the logical wcwidth width. As bad as too much or too little spacing
looks, having the whole terminal corrupt and leave 'droppings' all
over the place when you move the cursor looks much worse...

There is the possibility within POSIX to use the wcswidth function
instead of wcwidth, which in theory could accommodate
context-sensitive widths. Whether this is considered conformant I
don't know, but I do know that presently few apps support this and
that most apps would require significant rewrites to do so and major
additional complexity.

My proposed WI2 was to treat consonant clusters, rather than
individual consonants, as the element with a fixed width and assign
them the width of 2 (same as CJK ideographs and Hangul Jamo, the
latter of which seems to be the well-handled script with the most in
common with Indic consonant clusters). I'm fairly ignorant about nice
Indic typesetting, but my casual observations found all the common
clusters I could find fitting reasonably into a double-width cell. On
the other hand I'm worried that the "-2 width" for the virama would
confuse applications hopelessly, and that isolated dead letters would
have the wrong width.

Since you seem to be familiar with the matter, perhaps you could
comment on whether displaying text in fixed one-cell-per-character
form without width-alterring ligatures is considered acceptable. My
impression is that it would be mostly acceptable in Devanagari except
for the behavior of "ra", but might be significantly worse in other
scripts (Kannada?) which seem to make more use of vertical combining.

> Either way, Indic texts on a console would look really bad and be
> practically unusable if glyphs had to be put into a specified width:
> there would be too much spacing. Indic texts by their nature are
> most suited to variable-widths.

As far as I can tell they're presently unusable. I'm just trying to
find a way to make them usable and hopefully not make them ugly in the
process. If there are any working implementations already (in your
opinion) I'd be happy to hear about how they work.

Rich


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Indic scripts and wcwidth: comments?

Reply via email to