After much work, I finally have a working (but still experimental) version of uuterm and the "ucf" bitmap font format I proposed in August. Source for uuterm is browsable at http://svn.mplayerhq.hu/uuterm/ and a sample ucf font is linked from the included README.
Since ucf is probably more interesting to members of this list than particular software, I'll skip the stuff about uuterm and just get to the point of ucf. I based the design loosely on Markus Kuhn's old proposal for a bitmap font format that recognizes the difference between glyphs and characters. "Source code" for a ucf font looks like: # sa+la :000000007B1129650300000000000000 0F66+0FB3 # sa+* :000000007B2945030000000000000000 0F66+[0F90-0FAC] 0F66+[0FAE-0FB0] 0F66+[0FB4-0FBC] ... # ra la sha ssa sa :000000003E08081C2201010000000000 0F62 0F6A :00000000394545491D03010000000000 0F63 :000000000709096F3911090101000000 0F64 :000000007048487B4E44484040000000 0F65 :000000007B1129456313010000000000 0F66 The long hex number is a glyph bitmap, which can be edited easily with a program like Roman Czyborra'a "hexdraw" (from the GNU unifont protject), or imported/exported from other formats. Unlike unifont however there is no limitation on character cell size. The numbers that follow are the characters that the glyph can represent, and in which contexts. In the above example, the first glyph is used for the Tibetan consonant "sa" (U+0F66) when a combining "la" (U+0FB3) is attached to it. The second glyph is used for "sa" when any of the listed ranges of combining characters is attached, and the third glyph is used in any case not matching previous ones. Aside from the WITH_ATTACHED rule (represented by "+"), the format also has ATTACHED_TO (for shaping combining marks depending on the base character or previous combining mark) as well as rules for examining the character(s) in the previous/next cell (in visual order). Together with application of visual reordering rules by the application, I believe this is sufficient for nice (not perfect, but on a comparable level to rendering English text monospaced) presentation of Indic text. I will be converting GNU unifont and/or other free 8x16-cell fonts to make a fairly complete UCF font with all the necessary contextual glyph replacements, but it will be a slow process and I'm in no hurry. I'd welcome others who get interested in it to work on such a thing. I'd also be interested in studying the feasability of getting support for UCF in various *NIX consoles. A few comments on "Why not just use OpenType??": - The GSUB model does not adapt well to a character cell device where characters are organized into cells and where arbitrary string replacements don't make sense. - The glyph metric data is as large as the actual glyphs, doubling font size. Charcell fonts don't need any glyph metrics. - I don't think you can implement OpenType in less than 100 lines of C. The UCF char-to-glyph mapping algorithm is easy to implement and tiny. - Personally I like solutions that are adapted to the nature of the particular problem (character cell device) rather than trying to apply an overly general solution that will be awkward at best. - Something like UCF has a chance of getting into *NIX console drivers someday. I doubt anything OpenType-based would ever pass the necessary bloat tests to get integrated at such a low level. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
