J.Pietschmann wrote:
Be careful with the various TRs: UTR14 does not deal with character
(rather: grapheme) or word boundaries, that's UTX-29. Actually, we
don't use the latter.
Our line breaking should probably be done the following way (this
implements the "naive" paragraph filling strategy)
  loop
    calculate line width if next character is added
    check for a line breaking opportunity before the next character
    if there is an opportunity
      if the line is not full
        discard the last saved opportunity and save this
      else
        try hyphenation on the string accumulated since the
          last break opportunity (if enabled), save returned
          opportunity if any
        return saved line breaking opportunity
      end if
    end if
  end loop

hyphenation of a string:
 loop
   skip non-word characters (for this hyphenator)
   word = continuous run of word characters (for this hyphenator)
   if the end of the word is past the end of the line
     try hyphenating the word, generate new break opportunities
     return best fitting line break opportunity or null
   end if
 end loop

There is the degenerate case if the line overflows and no line break
opportunity is discovered at all.
The TeX paragraph filling strategy has to detect line break opportunities
the same way but selects the opportunities turning into actual line breaks
in a more clever way. We could do that too.

In my own thinking about the process of line-breaking, I have always assumed that a (possibly recursive) block of text is a fixed resource; a superset of the fixed resource that is a single glyph/grapheme with given font attributes. As such, it should be processed by a separate co-routine (to use the language of the Rec). All of the information about the hierarchy of potential break positions is determined by the text itself.


As a first cut, I would I would determine all potential breaks, along with information relevant to later line-height calculations, at the time a block is first prepared for layout. The co-routine (thread, whatever) that is grooming the text would then respond to enquiries about line-area possibilities, and eventually return contents for line-areas of particular dimensions. All of this is tentative, and all of the calculated information about the block would have to be held until the layout of the block is finalised.

What "finalised" means depends on the complexity of the layout strategies employed, but at a minimum, it must be maintained until the last page containing text from the block, and the subsequent page (if any) have been laid out, to allow for backtracking during last-page processing.

Peter
--
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>



Reply via email to