Re: Next Generation Console Font?

Rich Felker Thu, 03 Aug 2006 08:04:54 -0700

On Thu, Aug 03, 2006 at 08:41:35AM +0200, Werner LEMBERG wrote:
> > > What about using bitmap-only TrueType fonts, as planned by the X
> > > Windows people?
> >
> > Could you direct me to good information? I have serious doubts but
> > I'd at least like to read what they have to say.
> 
>   http://www.pps.jussieu.fr/~jch/software/xfree86-bitmap-fonts.html
> 
> I don't know the current status of such fonts w.r.t. X Windows.


Wow, I had no idea that XF86 was so stupid to gzip a file format that
was meant to be mmapped. That saves some disk space (dirt cheap) at
the expense of lots of load time and memory usage (expensive). If disk
space is really scarce a compressed fs should be used instead so mmap
is still available.

> > Quite frankly it doesn't matter it FontForge supports a bitmap font
> > format because "xbitmap" is the ideal tool for making bitmap fonts.
> 
> Please give an URL.  Another good bitmap font editor is xmbdfed from
> Mark Leisher.

Oops, I meant "bitmap". It's a trivial Xaw app that's been included
with X since near the beginning. I have xmbdfed too but the "Xm" part
of it makes it rather painful to use. Maybe it would be better if I
upgraded lesstif but I generally dislike motif anyway, and since the
BDF model identified characters with glyphs (which I think has been
well-established is a very bad idea) I'd rather use an editor that
just treats bitmaps as bitmaps without trying to treat them as fonts.

> > I also get the impression from some Apple papers i was browsing
> > recently that TTF/OpenType put the burden of knowing how to stack
> > combining characters and produce ligatures onto the software rather
> > than the font. Under such a system, applications will never support
> > all scripts unless they use one of the unweildy libraries with all
> > of this taken care of...
> 
> This is the wrong impression.  What you probably mean is that some
> language data needs to be proprocessed into a normalized form before
> it is fed into the font, for example Indic and Arabic scripts.

What sort of preprocessing? Reordering vowels? Replacement of Arabic
characters with the appropriate presentation forms?

> However, it is possible to add arbitrary tables to the font (which is
> another advantage of the SFNT format) which could move this
> preprocessing into the font.

Are there any papers on the SFNT format and its table language?

> > ...on the other hand, at least for bitmap fonts, simple rule-based
> > substitutions set up by the font designer can easily provide the
> > needed functionality with less than 5kb of code doing all the glyph
> > processing.
> 
> This is handled by the GSUB table.  There are many different formats,
> beginning with simple glyph replacing and ending with complex
> contextual glyph substitutions.

I found some docs on the format from MS, but they were hopelessly
poorly written and contained no information on how the font represents
the conditions under which the substitution should be performed.

> > Right now we're at an unfortunate point where the core X font system
> > has been deprecated, but there is nothing suitable in its place.
> 
> You should contact Keith Packard regarding this issue.  I think there
> is just some delay in the conversion of PCFs to SFNT due to more
> important problems.

How will this solve anything? The core protocol is still unacceptable
because all the glyph info has to be transmitted to the client, and
this info is way too big. The core protocol also seems unable to
perform any sort of nontrivial character->glyph mapping. Must every
application have font-specific information on how to do this, even
though the fonts are located on the server side and thus inaccessible
to the app? Or am I missing something?

> > Moreover non-X unix consoles are essentially deprecated as well
> > since they lack all but some patronizing Euro-centric 512-glyph
> > "Unicode" support. Do you think someone is going to integrate
> > FreeType into Linux anytime soon? :)
> 
> Why not?  FreeType is very modular by design; it would be possible to
> remove almost everything but bitmap-only SFNT handling.  Note,
> however, that this library doesn't interpret GSUB and other advanced
> OpenType tables by itself.  You need Pango or something similar for
> this.

As far as I can tell, if it's not doing outline rendering and not
using GSUB, etc. then FreeType isn't really doing anything except
parsing the file format and looking up glyphs. I don't see how this
would merit including FreeType at all; a trivial ~200-line
implementation should be able to do the same unless the file format is
hopelessly painful to work with.

> > All problem solving is about choosing the right tool for the job.
> > Storing bitmap fonts in the TTF/OpenType framework is like using a
> > nuclear missile to toast fruit flies, or like driving an SUV to
> > commute to the office...
> 
> You are underestimating the problem, I think.

The only part I'm potentially underestimating is the extent of context
information needed to choose a glyph. I'm aware that in extremely nice
rendering of script-style fonts you can often need context several
characters away, but as far as I know all scripts can be rendered in
their basic "print" form with only nearest-neighbor context. What I'm
unsure of is whether nearest-neighbor should mean character neighbors
only, or all character-CELL neighbors (which could be many more with
combining). I suspect it's the latter.

> The proper bitmap
> format is the least important thing, and the compact SFNT bitmap
> formats are not a bad choice IMHO. Much more important is the ability
> to store the glyph substitution tables efficiently.

What I mean by bitmap font format is the character->glyph mapping
system. Obviously the format of the actual glyph bitmaps is simple.

Is 3-4 bytes per potential substitution inefficient? That's what I'm
looking at. This is not counting context definitions specifying when
the substitution would be applied, but these definitions can often be
reused by many glyphs in the same script. As a simple example all the
Latin capital letters can share the "if superscribed combining mark is
attached" context. As a more nontrivial example, in the Tibetan font
I've drawn, there are many characters that share "subjoined below ra"
and "subjoined below sa" rules in order to have the appropriate size
and placement.

> > When it comes to character cell fonts (which is an even narrower
> > problem field than bitmap fonts), the goal is something that can
> > provide the baseline support for readable and correct display of any
> > script
> 
> What about top-to-down scripts like Mongolian which can't be written
> horizontally?  So I repeat my question: Which scripts do you imagine
> to support?

Mongolian can be and is written horizontally as well. Certainly you
can write vertical Mongolian in a Mongolian-only editor, or in a
top-down context in some sort of higher level word processor or markup
file, but the idea that you should see Mongolian filenames vertically
when you type "ls" somehow mixed in with other filenames in horizontal
orientation is hopeless.

mlterm (which I find works very poorly but seems to be the only
implementation of a multilingual terminal) does support vertical
orientation, but only if you run it in vertical mode, in which case
everything (including Latin) is printed vertically. IMO this is the
only sane approach. Especially since Mongolian _can_ be written
horizontally, you need to treat horizontal versus vertical orientation
as a localization or user preference applying to the system as a
whole (in the absence of higher level markup), not as a property of
the script.

> > [...] I'm extremely bitter about the sad state of m17n on unix and
> > the fact that there is not even one working Unicode terminal with
> > simultaneous support for all scripts.
> 
> There is a simple reason for this: What you want to do is impossible.

Maybe so, but the state is also much sadder than I made it sound.
Basically there's only one terminal that supports much more than
Western, CJK, Thai, Hebrew, and maybe a few other scripts. That one,
mlterm, lacks the ability to use information in the fonts for correct
combining and only supports Indic languages because it uses special
script-specific libraries.

> There will never be a program which supports `all' scripts.  Just
> think of Urdu, a special variant of Arabic, which isn't just a R2L
> script: It actually has this writing direction:
> 
>                               /    /    /
>                              /    /    /
>                             /    /    /
> 
> The longer a word, the bigger is its vertical height.

I'm told there's also a script that runs R2L and L2R alternating on
successive rows, i.e. snakes back and forth, though I've never
actually been told what it is so perhaps it's a myth. Whether it's
possible or reasonable to support such things remains to be seen. IMO
like Mongolian some of these issues need to be treated as a locale
issue or user preference issue instead of a necessity of the script.

Honestly if I used a R2L language, I would be much happier having the
whole terminal run right-to-left (including the Latin text used for
unix commands.. possibly with glyphs mirrored..?) then having to deal
with the headache of bidirectional text and my language being treated
as a second-class part of the interface. But then again there's
numerals and all kinds of other mess to screw it up. :)

Anyway the question with stuff like Urdu is whether it's imperative to
typeset the text in its standard written form or whether a 'computer
style' line-based form or something is acceptable. Keep in mind that
even Latin is written differently on a terminal than it is when
written by hand or in print; the "i" is as wide as the "m", for
instance. I'm not saying that it's justifiable to have crap support
for languages or scripts, just that sometimes a language has to adapt
and develop alternate presentation forms that _will_ work with
technology, or risk becoming irrelevant as technology becomes more
important in society.

> > So with that said, I'll continue on with my draft bitmap font format
> > (which already has a lot more simplifications -- remember, a work of
> > art is only complete when you can't find anything left to _remove_
> > from it), write my 5kb of code, integrate it into uuterm, and
> > somewhere in the next few months aim to have the first working
> > Unicode terminal emulator... in a 50kb static binary.
> 
> Good luck in handling Arabic and Indic scripts -- and Mongolian :-)

Indic is easy. Actually this is the part I'm most bitter about --
people treating something that should be easy as if it were a huge
unsolved problem and then not supporting it.. Mongolian too as long as
you follow the outline above.

Rich


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Next Generation Console Font?

Reply via email to