> I have xmbdfed too but the "Xm" part of it makes it rather painful
> to use.

At least for Linux Mark provides a binary with a statically linked
Motif library, AFAIK.

> > What you probably mean is that some language data needs to be
> > proprocessed into a normalized form before it is fed into the
> > font, for example Indic and Arabic scripts.
>
> What sort of preprocessing? Reordering vowels? Replacement of Arabic
> characters with the appropriate presentation forms?

Arabic needs tagging of glyphs as being `initial', `medial', `final',
and `isolated', as specified in the Unicode book.  Since this is
identical for all fonts the OpenType designers have decided to make
this information not being part of the font itself.  In the long run,
this makes the fonts smaller.  Something similar is done for Indic --
on the OpenType list you can right now find a discussion about a
reimplementation Indic font handling.

> > However, it is possible to add arbitrary tables to the font (which
> > is another advantage of the SFNT format) which could move this
> > preprocessing into the font.
>
> Are there any papers on the SFNT format and its table language?

Here are the two main references.

  http://www.microsoft.com/typography/SpecificationsOverview.mspx
  http://developer.apple.com/textfonts/TTRefMan/


> > This is handled by the GSUB table.  There are many different
> > formats, beginning with simple glyph replacing and ending with
> > complex contextual glyph substitutions.
>
> I found some docs on the format from MS, but they were hopelessly
> poorly written and contained no information on how the font
> represents the conditions under which the substitution should be
> performed.

The process is simple (at least in theory -- there are many tricky
details): A font contains a number of `features' like `use small caps'
or `use old ligatures', or `use a different set of digits'.  Each
feature consists of an ordered set of `lookups'.

Having a string of input character codes, you apply the first lookup
table, then you start again and process the next one, and so on until
all lookup tables have been applied.

> How will this solve anything?  The core protocol is still
> unacceptable because all the glyph info has to be transmitted to the
> client, and this info is way too big.

AFAIK is it possible to have fonts on the client side, avoiding the
overhead of transmitting fonts.

> The core protocol also seems unable to perform any sort of
> nontrivial character->glyph mapping.  Must every application have
> font-specific information on how to do this, even though the fonts
> are located on the server side and thus inaccessible to the app? Or
> am I missing something?

Please read this:

  http://keithp.com/~keithp/talks/usenix2001/xrender/

It discusses the X Rendering Extension which has become standard
meanwhile, I think.


> As far as I can tell, if it's not doing outline rendering and not
> using GSUB, etc. then FreeType isn't really doing anything except
> parsing the file format and looking up glyphs. I don't see how this
> would merit including FreeType at all; a trivial ~200-line
> implementation should be able to do the same unless the file format is
> hopelessly painful to work with.

You still need code to handle the SFNT format.  As mentioned in
another mail, you can compile FreeType without any support for outline
formats, using SFNT bitmap fonts only.

> What I mean by bitmap font format is the character->glyph mapping
> system.

I doubt that you find something really better than the abilities of
GSUB and GPOS tables.

> Is 3-4 bytes per potential substitution inefficient? That's what I'm
> looking at. This is not counting context definitions specifying when
> the substitution would be applied, but these definitions can often
> be reused by many glyphs in the same script. As a simple example all
> the Latin capital letters can share the "if superscribed combining
> mark is attached" context.

In OpenType parlance this is called a `glyph class', defined in the
GDEF table.

> Mongolian can be and is written horizontally as well.

Using Cyrillic, yes, but not the traditional script, AFAIK.

> Certainly you can write vertical Mongolian in a Mongolian-only
> editor, or in a top-down context in some sort of higher level word
> processor or markup file, but the idea that you should see Mongolian
> filenames vertically when you type "ls" somehow mixed in with other
> filenames in horizontal orientation is hopeless.

Exactly.  We are again at the point where we have to define which
scripts should be supported...

> I'm told there's also a script that runs R2L and L2R alternating on
> successive rows, i.e. snakes back and forth, though I've never
> actually been told what it is so perhaps it's a myth.

This is called `boustrophedon'.  Ancient Greek uses it, and Rongorongo
also (the undeciphered script from the Easter Island).

> Whether it's possible or reasonable to support such things remains
> to be seen.

It's not reasonable IMHO.  Another (quite natural) limitation of the
scripts to support.

> Anyway the question with stuff like Urdu is whether it's imperative
> to typeset the text in its standard written form or whether a
> 'computer style' line-based form or something is acceptable.

Ah, this is similar to the discussion whether it is acceptable to
represent the German `ü', `ä', and `"ö' with `ue', `ae', and `oe',
respectively.  For me as a native German speaker, this is extremely
ugly, and still a lot of computerized systems used in, say, public
transport facilities are displaying this.

So my answer is: No, this is not acceptable.

> I'm not saying that it's justifiable to have crap support for
> languages or scripts, just that sometimes a language has to adapt
> and develop alternate presentation forms that _will_ work with
> technology, or risk becoming irrelevant as technology becomes more
> important in society.

I'm quite conservative here: It's a very bad idea to adopt a language
or script to the computer.  It should be the opposite.


    Werner

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to