great summary and I appreciate you taking a serious look into this 
problem. Some comments below.

On Mon, 5 Dec 2005 05:48 pm, Luca Furini wrote:
> Manuel Mall wrote:
> > After that we really need to redesign the line breaking stuff. Not
> > the Knuth approach (and the implemented algorithms related to that)
> > but the way we arrive at the Knuth sequences and iterate and
> > process the text elements. This needs to be done to be able to do
> > white-space-treatment, UAX#14 line breaking, start- /end- space
> > resolution and generally to be able to handle some more aspects of
> > Unicode (e.g. glyph merging).
> Just trying to write down a few thoghts and summarise what we have
> already said about this:
> - the inline LMs can directly apply the linefeed-treatment property
> (ignoring, preserving or transforming the LF character) but have too
> much a limited "view" to handle correctly white-space-treatment and
> white-space-collapse, and to count the number of letter spaces

I would formulate the problem slightly differently:

The line breaking logic requires inspection of adjacent characters in 
the input, even if these characters are contained in different inline 
fo's, in the following cases:
a) To determine line break possibilities in accordance with the Unicode 
Annex UAX#14
b) To be able to apply the white-space-treatment property around FOP 
generated line breaks
c) To determine word boundaries, which are used
        i) to calculate the number of letter spaces in a word
        ii) to determine the actual words presented to the hyphenation 

> - the LineLM has to collect the "text" from its descendant nodes:
> non-textual objects should be taken into account too, as, for
> example, a leader between two spaces should prevent them from being
> collapsed; if spaces collapse only if they come from sibling nodes,
> this could maybe be handled during the collection by the
> InlineStackingLM

I have a slightly different view on the handling of spaces. We only need 
to be concerned about white-space-treatment around line breaks we 
generate. Everything else is already dealt with by the time the LM are 
invoked. This in turn means IMO we only need to know how "big" the glue 
element needs to be which is dropped if a line break is actually 
generated by the Knuth algorithm. Determining the value for "big" 
however means we need to consider adjacent spaces even if contained in 
different inline fo's.

> - the LineLM should then mark spaces that must be removed because
> they are trailing / leading, glyphs that must be merged (but which LM
> will paint them if the characters come from different text nodes?)
> and find the breaking points according to the unicode rules

Glyphs are only allowed to be merged if they carry the same / matching 
set of property values. Personally I would not be concerned if we 
therefore limit that logic to within a LM. While it is possible that 
someone could write something like
and the a and &#x0308; could be combined into an &x00e4; IMO this is a 
pretty degenerated case.

> - the LineLM should give someway the computed information to the
> descendant LM, that would use it to create at once the correct
> elements

Yes it could, but I am in two minds if this is the best approach or if 
the Line LM should create the Knuth sequences right away and store in 
them enough information so that during the addAreas phase the inline 
LMs can create the correct areas.

> - the resulting sequences would be ready for the breaking phase,
> without further analysis / checks / substitutions / changes
> The revised interface for inline LMs could then have (just a quick
> idea) a new appendText(StringBuffer) method and a modified version of
> getNextKnuthElements() having some extra parameter storing the
> information created by the LineLM; we should finally get rid of
> addALetterSpaceTo(), getWordChars(), hyphenate(), applyChanges() and
> getChangedKnuthElements().

Yes, we both seem to look for the same outcome.  My (certainly not fully 
thought through) model was more along the lines of the iterator 
approach used by the fo's to iterate over its char sequences during 
refinement. However, the iterator should probably not just return a 
character but enough information for the Line LM to build the area info 
objects to attach to the Knuth elements so that the add areas phase 
works correctly later.

Very good discussion - my summary is:

a) We both seem to want the same outcome, that is add required features 
and at the same time get rid of some of the workarounds currently used.

b) We both agree that the character by character analysis is done at 
Line LM level.

c) Your initial thought is that the Line LM should then provide enough 
information to the LMs to generate their Knuth sequences while my 
initial thought is that the Line LM generates the Knuth sequences and 
provides enough information for the LMs to generate their areas.

If you agree with this summary may be we can concentrate on discussing 
the pros and cons of the two approaches mentioned in item c) above?

> Regards
>      Luca



Reply via email to