Manuel Mall wrote:
What I observed is that most of these issue cannot be solved by looking
at a single character at a time. They need context, very often only one
character, sometimes more (e.g. sequence of white space). More
importantly the context needed is not limited to the fo they occur in.
They all span across fos. This is were the current LM structures and
especially the getNextKnuthElement interface really gets in the way of
things. Basically one cannot create the correct Knuth sequences without
the context but the context can come from everywhere (superior fo,
subordinate fo, or neighboring fo). So one needs look ahead and
backtrack features across all these boundaries and it feels extremely
messy.
It appears conceptually so much simpler to have only a single loop
interating over all the characters in a paragraph doing all the
character/glyph manipulation, word breaking (hyphenation), and line
breaking analysis and generation of the Knuth sequences in one place. An
example where this is currently done is the white space handling during
refinement. One loop at block level based on a recursive char iterator
that supports deletion and character replacement does the job. Very
simple and easy to understand. I have something similar in mind for
inline Knuth sequence generation. Of course the iterator would not only
return the character but relevant formatting information for it as well,
e.g. the font so the width etc. can be calculated. The iterator may also
have to indicate start/end border/padding and conditional border/padding
elements.
I think that there are two different "layers" that affect the generation
of the elements: one is the "text layer" (or maybe semantic level), where
we have the text and we can easily handle whitespace, recognize word
boundaries, find hyphenation points, regardless of the actual fo (and its
depth) where the text lives, and the "formatting layer" where we have the
resolved values for the properties like font, size, borders, etc. These
layers speak different languages, as one knows words and spaces and the
other elements and attributes.
At the moment, the getNextKnuthElements() method works at the formatting
level: each LM knows the relevant properties but has a limited view of the
text, whence the current difficulties.
Your proposal is to work at the text level (correct me if I'm wrong), with
the LineLM centralizing the handling of the text for a whole block. I
wonder if, doing so, we would not find difficult to know the resolved
property values applying to each piece of text.
I'm not saying that whe don't need changes in the LM interactions; I'm
just asking myself (and asking to you all, of course :-)) if it is really
possible to have both breaking and element generation *in one place*.
What if we had first a centralized control at the text level (the LineLM
putting together all the text, finding words, normalizing spaces,
performing hyphenation ...) and then a localized element generation (each
LM, basing on what the LineLM did and using the local properties)?
Something somewhat similar (but limited to single words) happens at the
moment with the getChangedKnuthElements() method, which is called only
after the LineLM has reconstructed a word, found its breaking points and
told the inline LMs where the breaks are.
Don't know if what I just wrote makes any sense; so, as I never tried to
do what you suggest or what I just attempted to describe, I really look
forward to see your code in action!
Regards
Luca