Manuel Mall wrote:

What I observed is that most of these issue cannot be solved by looking at a single character at a time. They need context, very often only one character, sometimes more (e.g. sequence of white space). More importantly the context needed is not limited to the fo they occur in. They all span across fos. This is were the current LM structures and especially the getNextKnuthElement interface really gets in the way of things. Basically one cannot create the correct Knuth sequences without the context but the context can come from everywhere (superior fo, subordinate fo, or neighboring fo). So one needs look ahead and backtrack features across all these boundaries and it feels extremely messy.


It appears conceptually so much simpler to have only a single loop interating over all the characters in a paragraph doing all the character/glyph manipulation, word breaking (hyphenation), and line breaking analysis and generation of the Knuth sequences in one place. An example where this is currently done is the white space handling during refinement. One loop at block level based on a recursive char iterator that supports deletion and character replacement does the job. Very simple and easy to understand. I have something similar in mind for inline Knuth sequence generation. Of course the iterator would not only return the character but relevant formatting information for it as well, e.g. the font so the width etc. can be calculated. The iterator may also have to indicate start/end border/padding and conditional border/padding elements.

I think that there are two different "layers" that affect the generation of the elements: one is the "text layer" (or maybe semantic level), where we have the text and we can easily handle whitespace, recognize word boundaries, find hyphenation points, regardless of the actual fo (and its depth) where the text lives, and the "formatting layer" where we have the resolved values for the properties like font, size, borders, etc. These layers speak different languages, as one knows words and spaces and the other elements and attributes.

At the moment, the getNextKnuthElements() method works at the formatting level: each LM knows the relevant properties but has a limited view of the text, whence the current difficulties.

Your proposal is to work at the text level (correct me if I'm wrong), with the LineLM centralizing the handling of the text for a whole block. I wonder if, doing so, we would not find difficult to know the resolved property values applying to each piece of text.

I'm not saying that whe don't need changes in the LM interactions; I'm just asking myself (and asking to you all, of course :-)) if it is really possible to have both breaking and element generation *in one place*.

What if we had first a centralized control at the text level (the LineLM putting together all the text, finding words, normalizing spaces, performing hyphenation ...) and then a localized element generation (each LM, basing on what the LineLM did and using the local properties)?

Something somewhat similar (but limited to single words) happens at the moment with the getChangedKnuthElements() method, which is called only after the LineLM has reconstructed a word, found its breaking points and told the inline LMs where the breaks are.

Don't know if what I just wrote makes any sense; so, as I never tried to do what you suggest or what I just attempted to describe, I really look forward to see your code in action!

Regards
    Luca

Reply via email to