On Tue, Oct 31, 2006 at 09:37:34AM -0800, rajeev joseph sebastian wrote: > Hi Rich Felker, > > I find your work to provide support for Indic text on > console/terminal to be admirable, and yes, any kind of display is > far better than none at all (and I do not consider your statement > insulting) :) > > What I was referring to was a comment along the lines of "... have a > set of wcwidth classes (say, 1, 2, and 3) and assign - glyphs - to > one of those classes... ". (Please forgive me if I misunderstood the > last few posts.) The word to note is "glyph". What I'm saying is you > cannot in advance specify the width of any given conjunct. It may be > different in different fonts.
Yes, my use of the word character rather than glyph was intentional however. I know that the typographically correct way to do spacing would be to measure the width of glyphs, but for better or worse the only standardized api (wcwidth) works in terms of characters, and terminals work in terms of characters. Sometimes this has benefits; for example it makes it so you can hilight text that was printed to the terminal and paste it into other apps or back into the terminal, with exact results which are suitable for filenames and such. This might not be possible if the app running in the terminal had converted the text to a glyph representation. So in a way it's nice that the character->glyph conversion is done at the last step, in the terminal, since it keeps the data in the logical representation instead of the presentation form. Of course it also has downsides too as I'm sure we're all aware. The other issue here is that there's no standard for glyph numbering, and Unicode doesn't represent glyphs, so there's really no way an application running on a terminal could directly print glyphs. Even if it could, just "cat file_with_indic_text.txt" on the terminal, or something simple like "ls", wouldn probably not work as expected. My hope is to work out a set of width assignments for characters so that reasonable glyph presentations of the character sequence always fit in the spacing privided by the sum of the "character widths". Unfortunately this may result in excess spacing in some (many?) cases, but I hope it can be made usable if not elegant. My (naive) understanding is that Kannada conjuncts take place mostly as a "subscript" to the bottom-right of the initial consonant and vowel mark, so perhaps they'll look fairly proper in such a scheme. > I suppose, we need to develop console specific fonts which can make > proper use of the available width classes (or the structure you > propose), however, I don't think any research has occurred in this > regard. Well, as long as a reasonable font size were chosen, any font that fits into the (possibly excessive) width allocation could be used in principle. For uuterm I'm working on 8x16-cell (and later other larger sizes) bitmap fonts, which I find much more usable, but there's no reason other terminal emulators like mlterm couldn't use truetype fonts in this framework. > So, a proper answer to your question: how many width classes, really > needs a lot of work both artistic as well as technical. (Malayalam > has about 950 conjuncts, so it has to be seen how they can fit into > those classes). Well my question is much simpler I think: given a character, what's the "most space" it can take up in any conjunct it forms? > Speaking of curses, doesnt Debian/(K)ubuntu use curses for its > installer ? I remember telling the Kubuntu devels to remove Hindi > from the list of languages, because looking at the rendering is > really horrible (misplaced vowels, and so many other things, > unrelated to spacing/width). Yes.. it's not really a curses problem though. As long as the terminal supports reordering and ligatures, using curses should not be much of a problem. I still need to write the reordering stuff for uuterm though. > It is unfortunate, that many developers think that by using > widestrings for each character is equivalent to support for all > languages under Unicode. I guess some even think that the > dotted-circle is a part of the script ;) Haha yeah. I still can't believe Roman Czyborra drew the original GNU Unifont with those hideous dotted circles in it... (Yes he knew they weren't part of the script, but...) My hope is to make it so that using multibyte char functions + wcwidth is sufficient for _usable_ support for all langs in apps that run on terminals. Then, as more users of these langs use the apps in question, hopefully other things (like line folding in scripts without word spacing, better spacing, integration with input methods, etc.) will come. Unlike most of the GUI projects working on these issues my goal is not to put word-processor-type layout in every app, just to fix what's broken and make them usable with more languages. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
