Re: Unicode issues

J.Pietschmann Mon, 15 Jan 2007 07:42:39 -0800

Manuel Mall wrote:

Font selection in combination with character substitution. Ligatures
and character shaping.
Joerg, can you elaborate on this for me please.


Fonts may contain glyphs for precomposed Unicode characters, or they
may not. If a list of fonts is searched for a glyph of a character,
it may be useful to look for
- glyphs for the encoded value (which needs the "Grapheme Cluster
  Boundaries" stuff from UAX#29)
- glyphs for the fully decomposed form (UAX#15 NFD)
- glyphs for maximal composition (UAX#15 NFC)

As for Ligatures and character shaping: an algorithm for automatically
detecting ligature points may use a pattern lookup similar to the
pattern based hyphenation. The pattern dictionary should store only
either NFD or NFC forms, for the same reason this is advisable for
hyphenation.

In unicode an 'umlaut' can berepresented as 1 or 2 codepoints. What in your opinion should fop doeither a codepoint which can be split into two or vice versa?


We should choose either NFD or NFC as a canonical representation for
hyphenation patters (and, in the future, for similar things), so that
hyphenation patterns containing umlauts can be found regardless of
the representation of the umlaut in the source file. Currently, we
don't care much, which works but may break suddenly.
There is obviously a slight space vs. run time tradeoff (NFC ought to
be more compact but NFC'ing the source text may be more expensive
than NFD'ing).

I noticed that PDF prints a # for a word joiner for example.


Ouch!

That's why Ithought that most Cf code points should be dealt with in layout and notbe passed to the renderers.


It depends on the features of the target format. After all, PDF viewers
do kerning and some paragraph typesetting (e.g. line centering) by
themselves if properly instructed. The SVG flow text also has some
"somewhat higher level" functionality, which users might prefer to be
used. Unfortunately, all this has potential to complicate the FOP
layout.

J.Pietschmann

Re: Unicode issues

Reply via email to