RT: line breaking

J.Pietschmann Wed, 12 Nov 2003 14:49:43 -0800

Victor Mote wrote:

I know of at least two line-breaking strategies that we probably want to
have in our stock strategies: 1) the line-by-line method used right now, and
2) a Tex-like paragraph-oriented strategy, which AFAIK doesn't exist yet.


Ahem, that's not what I meant, or the scope of UTR14. UTR14 provides for
"line break opportunities", for example you can break foo-bar after the
hyphen but not 789-123. Which opportunities are used is another matter.
FOP's current algorithm for determining line break opportunities is utterly
simplistic, basically "possibly break before any breaking space, or after
a hyphen or slash", the latter is done if hyphenation is enabled.

I omitted the forced line break issue, which is also in the UTR14 scope,
and hyphenation, which may lead to additional line break opportunities
but is outside of the UTR14 scope.

In your URL example, couldn't FOP see the "x-url" language & automatically
add or assume the glue characters for the user? That would perhaps make it
less obtrusive (I assume that you meant for the user).

Well, yes.

I don't see it there yet, but I am a little confused. It seems to me that
line-breaking consists of at least these components: 1) character-based
line-breaking opportunities (which UTR14 addresses), 2) word-based
line-breaking opportunities (which hyphenation dictionaries and patterns
address), and 3) some strategy for using these to find acceptable/optimal
line breaks. It sounds like you have addressed at least 1 and 3 in your
implementation.


Paragraph filling (your point 3) is not addressed.
Be careful with the various TRs: UTR14 does not deal with character
(rather: grapheme) or word boundaries, that's UTX-29. Actually, we
don't use the latter.
Our line breaking should probably be done the following way (this
implements the "naive" paragraph filling strategy)
  loop
    calculate line width if next character is added
    check for a line breaking opportunity before the next character
    if there is an opportunity
      if the line is not full
        discard the last saved opportunity and save this
      else
        try hyphenation on the string accumulated since the
          last break opportunity (if enabled), save returned
          opportunity if any
        return saved line breaking opportunity
      end if
    end if
  end loop

hyphenation of a string:
 loop
   skip non-word characters (for this hyphenator)
   word = continuous run of word characters (for this hyphenator)
   if the end of the word is past the end of the line
     try hyphenating the word, generate new break opportunities
     return best fitting line break opportunity or null
   end if
 end loop

There is the degenerate case if the line overflows and no line break
opportunity is discovered at all.
The TeX paragraph filling strategy has to detect line break opportunities
the same way but selects the opportunities turning into actual line breaks
in a more clever way. We could do that too.

This seems at least remotely related to fo.FOText.isWordChar(), which
attempts to find breaks between words.


Actually, we don't need breaks between words. We need identifying line
breaking opportunities, words for the purpose of hyphenation, and
resizable spaces for justification.
That's why WordArea was such a bad name.

J.Pietschmann

RT: line breaking

Reply via email to