As you know I am looking into the white space handling and this has now 
expanded into Unicode linebreaking, handling of Unicode formatting 
characters (e.g. ZWSP), get a handle on all the different break 
scenarios and their related Knuth sequences, Joerg threw glyph 
merging / substitution into the mix, and then we have l-r writing modes 
and BIDI.

What I observed is that most of these issue cannot be solved by looking 
at a single character at a time. They need context, very often only one 
character, sometimes more (e.g. sequence of white space). More 
importantly the context needed is not limited to the fo they occur in. 
They all span across fos. This is were the current LM structures and 
especially the getNextKnuthElement interface really gets in the way of 
things. Basically one cannot create the correct Knuth sequences without 
the context but the context can come from everywhere (superior fo, 
subordinate fo, or neighboring fo). So one needs look ahead and 
backtrack features across all these boundaries and it feels extremely 
messy.

It appears conceptually so much simpler to have only a single loop 
interating over all the characters in a paragraph doing all the 
character/glyph manipulation, word breaking (hyphenation), and line 
breaking analysis and generation of the Knuth sequences in one place. 
An example where this is currently done is the white space handling 
during refinement. One loop at block level based on a recursive char 
iterator that supports deletion and character replacement does the job. 
Very simple and easy to understand. I have something similar in mind 
for inline Knuth sequence generation. Of course the iterator would not 
only return the character but relevant formatting information for it as 
well, e.g. the font so the width etc. can be calculated. The iterator 
may also have to indicate start/end  border/padding and conditional 
border/padding elements.

Of course that would be quite a change internally although limited to 
inline LMs and not affecting any block level operations. The way to do 
this would be a branch in svn. But before I embark on such an endeavour 
I'll like to seek some feedback on the list. Anyone aware of serious 
problems with such an approach? Has it been tried before and failed for 
example? Those who designed the current getNextKnuth approach may have 
arguments why changing it for inline LMs is a bad idea? Any other 
views / concerns?

Thanks

Manuel

Reply via email to