On Mon, 7 Nov 2005 10:39 pm, Luca Furini wrote:
> Manuel Mall wrote:
> > What I observed is that most of these issue cannot be solved by
> > looking at a single character at a time. They need context, very
> > often only one character, sometimes more (e.g. sequence of white
> > space). More importantly the context needed is not limited to the
> > fo they occur in. They all span across fos. This is were the
> > current LM structures and especially the getNextKnuthElement
> > interface really gets in the way of things. Basically one cannot
> > create the correct Knuth sequences without the context but the
> > context can come from everywhere (superior fo, subordinate fo, or
> > neighboring fo). So one needs look ahead and backtrack features
> > across all these boundaries and it feels extremely messy.
> >
> > It appears conceptually so much simpler to have only a single loop
> > interating over all the characters in a paragraph doing all the
> > character/glyph manipulation, word breaking (hyphenation), and line
> > breaking analysis and generation of the Knuth sequences in one
> > place. An example where this is currently done is the white space
> > handling during refinement. One loop at block level based on a
> > recursive char iterator that supports deletion and character
> > replacement does the job. Very simple and easy to understand. I
> > have something similar in mind for inline Knuth sequence
> > generation. Of course the iterator would not only return the
> > character but relevant formatting information for it as well, e.g.
> > the font so the width etc. can be calculated. The iterator may also
> > have to indicate start/end border/padding and conditional
> > border/padding elements.
>
> I think that there are two different "layers" that affect the
> generation of the elements: one is the "text layer" (or maybe
> semantic level), where we have the text and we can easily handle
> whitespace, recognize word boundaries, find hyphenation points,
> regardless of the actual fo (and its depth) where the text lives, and
> the "formatting layer" where we have the resolved values for the
> properties like font, size, borders, etc. These layers speak
> different languages, as one knows words and spaces and the other
> elements and attributes.
>
> At the moment, the getNextKnuthElements() method works at the
> formatting level: each LM knows the relevant properties but has a
> limited view of the text, whence the current difficulties.
>
> Your proposal is to work at the text level (correct me if I'm wrong),
> with the LineLM centralizing the handling of the text for a whole
> block. I wonder if, doing so, we would not find difficult to know the
> resolved property values applying to each piece of text.
>
> I'm not saying that whe don't need changes in the LM interactions;
> I'm just asking myself (and asking to you all, of course :-)) if it
> is really possible to have both breaking and element generation *in
> one place*.
>
> What if we had first a centralized control at the text level (the
> LineLM putting together all the text, finding words, normalizing
> spaces, performing hyphenation ...) and then a localized element
> generation (each LM, basing on what the LineLM did and using the
> local properties)?
>
> Something somewhat similar (but limited to single words) happens at
> the moment with the getChangedKnuthElements() method, which is called
> only after the LineLM has reconstructed a word, found its breaking
> points and told the inline LMs where the breaks are.
>
> Don't know if what I just wrote makes any sense; so, as I never tried
> to do what you suggest or what I just attempted to describe, I really
> look forward to see your code in action!
>
Luca,

yes, what you wrote makes sense and I am not at the coding stage yet. So 
don't hold your breath yet with respect to seeing new code from me - 
you may get blue in the face. Still trying to get my head around all 
the possible issues. I think your suggestion has quite a few merits. To 
rephrase it in my words: we do a text processing stage which precedes 
the getNextKnuthElements and (among other things) determines all the 
break possibilities. This list is then given to the LMs as part of the 
getNextKnuth call and the LMs can build the Knuth elements based on 
their local knowledge (properties) + the already calculated break 
possibilities. We may even be able to do that during the refinement 
(white space handling) loop thereby keeping repeated iterations over 
the text to a minimum.

I like the sound of this as it retains lots of what we have while 
addressing the need to analyse text across fo boundaries.

> Regards
>      Luca

Thanks

Manuel

Reply via email to