Announcing uuterm and ucf (universal charcell font)

Rich Felker Thu, 05 Oct 2006 15:06:14 -0700

After much work, I finally have a working (but still experimental)
version of uuterm and the "ucf" bitmap font format I proposed in
August. Source for uuterm is browsable at
http://svn.mplayerhq.hu/uuterm/ and a sample ucf font is linked from
the included README.


Since ucf is probably more interesting to members of this list than
particular software, I'll skip the stuff about uuterm and just get to
the point of ucf. I based the design loosely on Markus Kuhn's old
proposal for a bitmap font format that recognizes the difference
between glyphs and characters. "Source code" for a ucf font looks
like:

# sa+la
:000000007B1129650300000000000000 0F66+0FB3

# sa+*
:000000007B2945030000000000000000 0F66+[0F90-0FAC] 0F66+[0FAE-0FB0] 
0F66+[0FB4-0FBC]

...

# ra la sha ssa sa
:000000003E08081C2201010000000000 0F62 0F6A
:00000000394545491D03010000000000 0F63
:000000000709096F3911090101000000 0F64
:000000007048487B4E44484040000000 0F65
:000000007B1129456313010000000000 0F66

The long hex number is a glyph bitmap, which can be edited easily with
a program like Roman Czyborra'a "hexdraw" (from the GNU unifont
protject), or imported/exported from other formats. Unlike unifont
however there is no limitation on character cell size.

The numbers that follow are the characters that the glyph can
represent, and in which contexts. In the above example, the first
glyph is used for the Tibetan consonant "sa" (U+0F66) when a combining
"la" (U+0FB3) is attached to it. The second glyph is used for "sa"
when any of the listed ranges of combining characters is attached, and
the third glyph is used in any case not matching previous ones.

Aside from the WITH_ATTACHED rule (represented by "+"), the format
also has ATTACHED_TO (for shaping combining marks depending on the
base character or previous combining mark) as well as rules for
examining the character(s) in the previous/next cell (in visual
order). Together with application of visual reordering rules by the
application, I believe this is sufficient for nice (not perfect, but
on a comparable level to rendering English text monospaced)
presentation of Indic text.



I will be converting GNU unifont and/or other free 8x16-cell fonts to
make a fairly complete UCF font with all the necessary contextual
glyph replacements, but it will be a slow process and I'm in no hurry.
I'd welcome others who get interested in it to work on such a thing.
I'd also be interested in studying the feasability of getting support
for UCF in various *NIX consoles.



A few comments on "Why not just use OpenType??":

- The GSUB model does not adapt well to a character cell device where
  characters are organized into cells and where arbitrary string
  replacements don't make sense.

- The glyph metric data is as large as the actual glyphs, doubling
  font size. Charcell fonts don't need any glyph metrics.

- I don't think you can implement OpenType in less than 100 lines of
  C. The UCF char-to-glyph mapping algorithm is easy to implement and
  tiny.

- Personally I like solutions that are adapted to the nature of the
  particular problem (character cell device) rather than trying to
  apply an overly general solution that will be awkward at best.

- Something like UCF has a chance of getting into *NIX console drivers
  someday. I doubt anything OpenType-based would ever pass the
  necessary bloat tests to get integrated at such a low level.


Rich


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Announcing uuterm and ucf (universal charcell font)

Reply via email to