On Thu, 2 Jan 2003, Edward Cherlin wrote:
> On Sunday 24 November 2002 06:22 pm, Gaspar Sinai wrote:
> > Hi,
> > I have a new redering schema in mind, that I am going to
> > implement in a future version of Yudit.
>
> I'm going to request that you not do this. Please concentrate your efforts on
> Yudit, which is my editor of choice. Please consider using Pango or Graphite
> instead. At least talk to the developers about the issues they see in
> implementing a rendering engine and modifying the OpenType format.

It is very unlikely that I will have enough time for this so don't
worry :) Of course Yudit enhancememtns are of highest priority
now. Thank you very much for using Yudit, I am very happy that I
created something useful. I wish I had more time for it - If I had
it would probably not suffer from the 'least effort' design
effects...

> > The major problems with it are:
> > ① Very difficult to test the rendering software. Features,
> >   need to be applied in a certain order. Even one mix-up will
> >   result in bad rendering, and a very complex test should be
> >   manually performed to catch the bug.
>
> Any rendering engine will be difficult to test, because of the complexity of
> the fundamental processes. If not OT features, the same capabilities have to
> be implemented some other way.

What I miss most during testing is automated testing. This is what
I would like have with a reversible rendering algorithm.

> > ② OpenType tables can not be shared between two fonts of the
> >   same time, although similar positioning/substituting  needs
> >   to be performed. This makes the font file unnecessarily big.
>
> If this is a real issue, then we can devise a way to share the tables.
> I don't think it is an issue. Most fonts are well under 100K,
> except CJK.

I would be very happy if tables could be shared. This would
be very useful when multiple fonts are used with not very much
physical memory.

>
> > ③ OpenType is not an Open Standard.
>
> Defining a new font format might be a good idea, if we can get the designers
> of font servers and rendering engines to agree on it.
>
> > ④ Rendering is a non-reversible process
>
> True regardless of the engine.

I would argue on this. With a small and different set of glyphs,
especially when the script is known even the screen image could be
converted back to Unicode. Of course, this would require more
efforts, but this is not impossibly difficult like, say translating
into a different language.

But we don't need to go that far: it would be enough to convert
back the Glyph componnets into the original input (see Glyph
component definition later in this mail. I do not have
better terminology. In OTF they simply call them Glyphs - even
thogh they may be just part of a cluster) What is the right
word for this tiny components that make up the clusters?
Like a variant of JAMO that appears at a spacific place.

When writing a program I usually put very high emphasis on
testablility. (Well, this is mostly true at work, for Yudit
I have just a few hours or even less spare time.) I try to
increase testability by adhering to  my principles:

- Using independent reversible algorithms to facilitate
  automatic testing.

- Using algorigthms that have very few branches. If they
  have some, use algorithms where each brach will execute at least
  once during testing. Use a mathematical formulae that has
  global effects if wrong to replace a branch.

- Mixing independent implementations of routines randomly with
  parrallel path execution so that any small bug would have
  global effect. This is just like cutting a wire in a multiplier
  device: multipling random numbers would show this immediately,
  and if you test it with a pattern expecting cut and short-circuited
  wires the error shows up even faster.

So returning to the point: I think it is possible to design
a rendering engine that could be tested. Even a pentium processor
can be tested pretty well, even though it contains a couple of
orders of magnitude as many gates as the number of characters
in Unicode.

Talking about hardware: has anyone made a handware accelerated
Unicode rendering engine yet? This would have enormous effects.

> > The idea is:
> > Ⅰ. Assign codes and hot spots for all possible Glyph componenents,
> >   per script, per language system.
>
> Impossible. You can't predefine all glyphs for a language, much less for a
> script.
>
> The first counterexample to consider is Korean, where proper rendering has to
> reshape and reposition glyphs in a two-dimensional context.  There are about
> 40 letters, each of which appears in hundreds of slight variations, and it is
> not difficult to make up syllables which would require new variations.
>
> Next consider Arabic-script languages, where the set of ligatures is
> language-and font-specific, and remains highly variable.
>
> Indic scripts have similar problems about the variety of ligatures.
>
> When stacking combining characters, whether multiply-accented Latin, math
> symbols, or Sanskrit written in the Tibetan alphabet, it has not proven
> possible to list all of the possibilities. Some of the positioning and
> shaping has to be done dynamically.

By Glyph elements I mean the smallest components of a cluster.
The process would look like converting Unicode to glyphs elements and
positions, and for testing using these elements and positions
converting them back into Unicode - they should convert back to the
same or equivalent Unicode stream.


Render: U+XXXX U+YYYY U+ZZZZ R-> G+MMMM G+NNNN
Test:   G+MMMM G+NNNN T-> U+XXXX U+YYYY U+ZZZZ or its equivalent.

For Korean, just a few variants of JAMO are needed: they can sit in
just a few different places, with different sizes. Tibtean Glyph
elements also could have just a few places and sizes.

For Indic and Arabic the need of compulsory ligature definitions is
where I see a problem.

In an OTF these Glyph elements are defined anyway: they do not contain
an endless number of Glyphs, but the number of combination is huge,
just like in our case.

> > Ⅱ. Create a generic state machine thet can step through the input
> >   unicode characters, and spit out Glyph components and their relative
> >  hot spot positions.
> > Ⅲ. Create states and a dynamically loadable state table per script
> >  per language system.
>
> Per font, in some scripts.

This is where the difficulty comes in: there is no standard now
that says what are the compulsory ligatures and what are not.
I suggest these should be defined. For English speakers
the simple fi ligatures is shown in the examples. Most people think
that is is as simple as that.  Changing the font will render the fi
differently.

There are a lot of combinations that look, unlike fi, completely
different when ligated. They do not look even remotely like the
unligated components. If we rely on the font, authors can not
trust plain text files, they must use word processors to store
the font information in the file to avoid unwanted ligation with
a different font. Plain text without compulsory ligature definitions
is meaningless, IMHO.

CJK Unification poses less problems: I can figure out what font
needs to be used for different portions of the text just by
looking at it. For Indic, Arabic I have no way of determining
the font the have the correct ligatures to display my plain text
files, the way the author wanted.

As I understand, Unicode defines scripts and codepoints, but
they do not go out of their way to define how they should look
like on the screen. (Even glyph variant samples are not defined
for CJK in the Unicode book.)

Authors that write plain text should be aware that the text
may change considerably when viewed with a different font.

This might be a similar to a handwritten document when it
gets printed. But at least handwritten ligatures can appear
as ligatures in print too. If compulsory ligatures are not
defined I can not even write a palindrome. It may not look
like a palindrome with a different font at all. An old
document may also look strange with ligatures invented
later.

In a sense Unicode is pretty democratic: there are more
viewers than writers: readers have more freedom :)

> > Ⅳ. Create bitmap and vector fonts. The glyph codepoints are
> >  defined in (Ⅰ.) so this will be an easy process. Much easier
> >  than creating OpenType tables.
>
> Exactly equivalent to creating OpenType tables. You just make fewer copies of
> the data (assuming availability of Free/Open Source fonts).

The difference is that the font creators would have all
the Glyph components defined in a consistent way in our case.

VOLT helps a lot when creating OT Fonts. But I fear it is
still be possible  to miss something and having one untested
input combination rendered in an unexpected way. There are
to many branches because of the exeptions, and because of
the numerous features an OTF has.


> > Ⅶ. Use (Ⅱ) and pass it to (Ⅴ) to see if we get back out stream.
> >  We can test the rendering engine on-the fly this way.
> > Ⅷ. Use (Ⅴ.) for OCR (character recognition) software to scan
> >   text images into Unicode stream.
> > This is all - I am running out of Roman numerals ☺
> >
> > The merits of such a rendering/font schema would be:
> > - Fonts do not need to carry extra extra tables
>
> Fewer copies, but not less data.

That is right: but this data could be shared, resulting in
less memory consumption, when the system is using different
fonts.

> > - Rendering is linear and needs very littel processing power.
>
> ??

Certainly this cries for more explanation. First part of
the explanation: I never read my outgoing mails twice and I
never use a spell checker, sorry for this.

What I mean here is that I would like to achive the whole
thing in one pass instead of going through the glyph substitution
process in more passes. Hopefully it won't look like my emails ;)

> > - It is testable
> > - It is bitmap-font-friendly
>
> I can't imagine it. How do you do adjustable glyphs in bitmaps?

I can't see why the algorthm would not work with bitmap fonts...

Happy New Year!

G̳á̳s̳p̳á̳r̳ ・ ɡaːʃpaːr
ガーシュパール・Гашпар・가스팔・גאשפאר・گاشپار
‎‏17-3*5


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to