great summary and I appreciate you taking a serious look into this
problem. Some comments below.
On Mon, 5 Dec 2005 05:48 pm, Luca Furini wrote:
> Manuel Mall wrote:
> > After that we really need to redesign the line breaking stuff. Not
> > the Knuth approach (and the implemented algorithms related to that)
> > but the way we arrive at the Knuth sequences and iterate and
> > process the text elements. This needs to be done to be able to do
> > white-space-treatment, UAX#14 line breaking, start- /end- space
> > resolution and generally to be able to handle some more aspects of
> > Unicode (e.g. glyph merging).
> Just trying to write down a few thoghts and summarise what we have
> already said about this:
> - the inline LMs can directly apply the linefeed-treatment property
> (ignoring, preserving or transforming the LF character) but have too
> much a limited "view" to handle correctly white-space-treatment and
> white-space-collapse, and to count the number of letter spaces
I would formulate the problem slightly differently:
The line breaking logic requires inspection of adjacent characters in
the input, even if these characters are contained in different inline
fo's, in the following cases:
a) To determine line break possibilities in accordance with the Unicode
b) To be able to apply the white-space-treatment property around FOP
generated line breaks
c) To determine word boundaries, which are used
i) to calculate the number of letter spaces in a word
ii) to determine the actual words presented to the hyphenation
> - the LineLM has to collect the "text" from its descendant nodes:
> non-textual objects should be taken into account too, as, for
> example, a leader between two spaces should prevent them from being
> collapsed; if spaces collapse only if they come from sibling nodes,
> this could maybe be handled during the collection by the
I have a slightly different view on the handling of spaces. We only need
to be concerned about white-space-treatment around line breaks we
generate. Everything else is already dealt with by the time the LM are
invoked. This in turn means IMO we only need to know how "big" the glue
element needs to be which is dropped if a line break is actually
generated by the Knuth algorithm. Determining the value for "big"
however means we need to consider adjacent spaces even if contained in
different inline fo's.
> - the LineLM should then mark spaces that must be removed because
> they are trailing / leading, glyphs that must be merged (but which LM
> will paint them if the characters come from different text nodes?)
> and find the breaking points according to the unicode rules
Glyphs are only allowed to be merged if they carry the same / matching
set of property values. Personally I would not be concerned if we
therefore limit that logic to within a LM. While it is possible that
someone could write something like
and the a and ̈ could be combined into an &x00e4; IMO this is a
pretty degenerated case.
> - the LineLM should give someway the computed information to the
> descendant LM, that would use it to create at once the correct
Yes it could, but I am in two minds if this is the best approach or if
the Line LM should create the Knuth sequences right away and store in
them enough information so that during the addAreas phase the inline
LMs can create the correct areas.
> - the resulting sequences would be ready for the breaking phase,
> without further analysis / checks / substitutions / changes
> The revised interface for inline LMs could then have (just a quick
> idea) a new appendText(StringBuffer) method and a modified version of
> getNextKnuthElements() having some extra parameter storing the
> information created by the LineLM; we should finally get rid of
> addALetterSpaceTo(), getWordChars(), hyphenate(), applyChanges() and
Yes, we both seem to look for the same outcome. My (certainly not fully
thought through) model was more along the lines of the iterator
approach used by the fo's to iterate over its char sequences during
refinement. However, the iterator should probably not just return a
character but enough information for the Line LM to build the area info
objects to attach to the Knuth elements so that the add areas phase
works correctly later.
Very good discussion - my summary is:
a) We both seem to want the same outcome, that is add required features
and at the same time get rid of some of the workarounds currently used.
b) We both agree that the character by character analysis is done at
Line LM level.
c) Your initial thought is that the Line LM should then provide enough
information to the LMs to generate their Knuth sequences while my
initial thought is that the Line LM generates the Knuth sequences and
provides enough information for the LMs to generate their areas.
If you agree with this summary may be we can concentrate on discussing
the pros and cons of the two approaches mentioned in item c) above?