As you know I am looking into the white space handling and this has now expanded into Unicode linebreaking, handling of Unicode formatting characters (e.g. ZWSP), get a handle on all the different break scenarios and their related Knuth sequences, Joerg threw glyph merging / substitution into the mix, and then we have l-r writing modes and BIDI.
What I observed is that most of these issue cannot be solved by looking at a single character at a time. They need context, very often only one character, sometimes more (e.g. sequence of white space). More importantly the context needed is not limited to the fo they occur in. They all span across fos. This is were the current LM structures and especially the getNextKnuthElement interface really gets in the way of things. Basically one cannot create the correct Knuth sequences without the context but the context can come from everywhere (superior fo, subordinate fo, or neighboring fo). So one needs look ahead and backtrack features across all these boundaries and it feels extremely messy. It appears conceptually so much simpler to have only a single loop interating over all the characters in a paragraph doing all the character/glyph manipulation, word breaking (hyphenation), and line breaking analysis and generation of the Knuth sequences in one place. An example where this is currently done is the white space handling during refinement. One loop at block level based on a recursive char iterator that supports deletion and character replacement does the job. Very simple and easy to understand. I have something similar in mind for inline Knuth sequence generation. Of course the iterator would not only return the character but relevant formatting information for it as well, e.g. the font so the width etc. can be calculated. The iterator may also have to indicate start/end border/padding and conditional border/padding elements. Of course that would be quite a change internally although limited to inline LMs and not affecting any block level operations. The way to do this would be a branch in svn. But before I embark on such an endeavour I'll like to seek some feedback on the list. Anyone aware of serious problems with such an approach? Has it been tried before and failed for example? Those who designed the current getNextKnuth approach may have arguments why changing it for inline LMs is a bad idea? Any other views / concerns? Thanks Manuel
