On Thu, Aug 03, 2006 at 08:41:35AM +0200, Werner LEMBERG wrote: > > > What about using bitmap-only TrueType fonts, as planned by the X > > > Windows people? > > > > Could you direct me to good information? I have serious doubts but > > I'd at least like to read what they have to say. > > http://www.pps.jussieu.fr/~jch/software/xfree86-bitmap-fonts.html > > I don't know the current status of such fonts w.r.t. X Windows.
Wow, I had no idea that XF86 was so stupid to gzip a file format that was meant to be mmapped. That saves some disk space (dirt cheap) at the expense of lots of load time and memory usage (expensive). If disk space is really scarce a compressed fs should be used instead so mmap is still available. > > Quite frankly it doesn't matter it FontForge supports a bitmap font > > format because "xbitmap" is the ideal tool for making bitmap fonts. > > Please give an URL. Another good bitmap font editor is xmbdfed from > Mark Leisher. Oops, I meant "bitmap". It's a trivial Xaw app that's been included with X since near the beginning. I have xmbdfed too but the "Xm" part of it makes it rather painful to use. Maybe it would be better if I upgraded lesstif but I generally dislike motif anyway, and since the BDF model identified characters with glyphs (which I think has been well-established is a very bad idea) I'd rather use an editor that just treats bitmaps as bitmaps without trying to treat them as fonts. > > I also get the impression from some Apple papers i was browsing > > recently that TTF/OpenType put the burden of knowing how to stack > > combining characters and produce ligatures onto the software rather > > than the font. Under such a system, applications will never support > > all scripts unless they use one of the unweildy libraries with all > > of this taken care of... > > This is the wrong impression. What you probably mean is that some > language data needs to be proprocessed into a normalized form before > it is fed into the font, for example Indic and Arabic scripts. What sort of preprocessing? Reordering vowels? Replacement of Arabic characters with the appropriate presentation forms? > However, it is possible to add arbitrary tables to the font (which is > another advantage of the SFNT format) which could move this > preprocessing into the font. Are there any papers on the SFNT format and its table language? > > ...on the other hand, at least for bitmap fonts, simple rule-based > > substitutions set up by the font designer can easily provide the > > needed functionality with less than 5kb of code doing all the glyph > > processing. > > This is handled by the GSUB table. There are many different formats, > beginning with simple glyph replacing and ending with complex > contextual glyph substitutions. I found some docs on the format from MS, but they were hopelessly poorly written and contained no information on how the font represents the conditions under which the substitution should be performed. > > Right now we're at an unfortunate point where the core X font system > > has been deprecated, but there is nothing suitable in its place. > > You should contact Keith Packard regarding this issue. I think there > is just some delay in the conversion of PCFs to SFNT due to more > important problems. How will this solve anything? The core protocol is still unacceptable because all the glyph info has to be transmitted to the client, and this info is way too big. The core protocol also seems unable to perform any sort of nontrivial character->glyph mapping. Must every application have font-specific information on how to do this, even though the fonts are located on the server side and thus inaccessible to the app? Or am I missing something? > > Moreover non-X unix consoles are essentially deprecated as well > > since they lack all but some patronizing Euro-centric 512-glyph > > "Unicode" support. Do you think someone is going to integrate > > FreeType into Linux anytime soon? :) > > Why not? FreeType is very modular by design; it would be possible to > remove almost everything but bitmap-only SFNT handling. Note, > however, that this library doesn't interpret GSUB and other advanced > OpenType tables by itself. You need Pango or something similar for > this. As far as I can tell, if it's not doing outline rendering and not using GSUB, etc. then FreeType isn't really doing anything except parsing the file format and looking up glyphs. I don't see how this would merit including FreeType at all; a trivial ~200-line implementation should be able to do the same unless the file format is hopelessly painful to work with. > > All problem solving is about choosing the right tool for the job. > > Storing bitmap fonts in the TTF/OpenType framework is like using a > > nuclear missile to toast fruit flies, or like driving an SUV to > > commute to the office... > > You are underestimating the problem, I think. The only part I'm potentially underestimating is the extent of context information needed to choose a glyph. I'm aware that in extremely nice rendering of script-style fonts you can often need context several characters away, but as far as I know all scripts can be rendered in their basic "print" form with only nearest-neighbor context. What I'm unsure of is whether nearest-neighbor should mean character neighbors only, or all character-CELL neighbors (which could be many more with combining). I suspect it's the latter. > The proper bitmap > format is the least important thing, and the compact SFNT bitmap > formats are not a bad choice IMHO. Much more important is the ability > to store the glyph substitution tables efficiently. What I mean by bitmap font format is the character->glyph mapping system. Obviously the format of the actual glyph bitmaps is simple. Is 3-4 bytes per potential substitution inefficient? That's what I'm looking at. This is not counting context definitions specifying when the substitution would be applied, but these definitions can often be reused by many glyphs in the same script. As a simple example all the Latin capital letters can share the "if superscribed combining mark is attached" context. As a more nontrivial example, in the Tibetan font I've drawn, there are many characters that share "subjoined below ra" and "subjoined below sa" rules in order to have the appropriate size and placement. > > When it comes to character cell fonts (which is an even narrower > > problem field than bitmap fonts), the goal is something that can > > provide the baseline support for readable and correct display of any > > script > > What about top-to-down scripts like Mongolian which can't be written > horizontally? So I repeat my question: Which scripts do you imagine > to support? Mongolian can be and is written horizontally as well. Certainly you can write vertical Mongolian in a Mongolian-only editor, or in a top-down context in some sort of higher level word processor or markup file, but the idea that you should see Mongolian filenames vertically when you type "ls" somehow mixed in with other filenames in horizontal orientation is hopeless. mlterm (which I find works very poorly but seems to be the only implementation of a multilingual terminal) does support vertical orientation, but only if you run it in vertical mode, in which case everything (including Latin) is printed vertically. IMO this is the only sane approach. Especially since Mongolian _can_ be written horizontally, you need to treat horizontal versus vertical orientation as a localization or user preference applying to the system as a whole (in the absence of higher level markup), not as a property of the script. > > [...] I'm extremely bitter about the sad state of m17n on unix and > > the fact that there is not even one working Unicode terminal with > > simultaneous support for all scripts. > > There is a simple reason for this: What you want to do is impossible. Maybe so, but the state is also much sadder than I made it sound. Basically there's only one terminal that supports much more than Western, CJK, Thai, Hebrew, and maybe a few other scripts. That one, mlterm, lacks the ability to use information in the fonts for correct combining and only supports Indic languages because it uses special script-specific libraries. > There will never be a program which supports `all' scripts. Just > think of Urdu, a special variant of Arabic, which isn't just a R2L > script: It actually has this writing direction: > > / / / > / / / > / / / > > The longer a word, the bigger is its vertical height. I'm told there's also a script that runs R2L and L2R alternating on successive rows, i.e. snakes back and forth, though I've never actually been told what it is so perhaps it's a myth. Whether it's possible or reasonable to support such things remains to be seen. IMO like Mongolian some of these issues need to be treated as a locale issue or user preference issue instead of a necessity of the script. Honestly if I used a R2L language, I would be much happier having the whole terminal run right-to-left (including the Latin text used for unix commands.. possibly with glyphs mirrored..?) then having to deal with the headache of bidirectional text and my language being treated as a second-class part of the interface. But then again there's numerals and all kinds of other mess to screw it up. :) Anyway the question with stuff like Urdu is whether it's imperative to typeset the text in its standard written form or whether a 'computer style' line-based form or something is acceptable. Keep in mind that even Latin is written differently on a terminal than it is when written by hand or in print; the "i" is as wide as the "m", for instance. I'm not saying that it's justifiable to have crap support for languages or scripts, just that sometimes a language has to adapt and develop alternate presentation forms that _will_ work with technology, or risk becoming irrelevant as technology becomes more important in society. > > So with that said, I'll continue on with my draft bitmap font format > > (which already has a lot more simplifications -- remember, a work of > > art is only complete when you can't find anything left to _remove_ > > from it), write my 5kb of code, integrate it into uuterm, and > > somewhere in the next few months aim to have the first working > > Unicode terminal emulator... in a 50kb static binary. > > Good luck in handling Arabic and Indic scripts -- and Mongolian :-) Indic is easy. Actually this is the part I'm most bitter about -- people treating something that should be easy as if it were a huge unsolved problem and then not supporting it.. Mongolian too as long as you follow the outline above. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
