I know of at least two line-breaking strategies that we probably want to have in our stock strategies: 1) the line-by-line method used right now, and 2) a Tex-like paragraph-oriented strategy, which AFAIK doesn't exist yet.
Ahem, that's not what I meant, or the scope of UTR14. UTR14 provides for "line break opportunities", for example you can break foo-bar after the hyphen but not 789-123. Which opportunities are used is another matter. FOP's current algorithm for determining line break opportunities is utterly simplistic, basically "possibly break before any breaking space, or after a hyphen or slash", the latter is done if hyphenation is enabled.
I omitted the forced line break issue, which is also in the UTR14 scope, and hyphenation, which may lead to additional line break opportunities but is outside of the UTR14 scope.
In your URL example, couldn't FOP see the "x-url" language & automatically add or assume the glue characters for the user? That would perhaps make it less obtrusive (I assume that you meant for the user).
Well, yes.
I don't see it there yet, but I am a little confused. It seems to me that line-breaking consists of at least these components: 1) character-based line-breaking opportunities (which UTR14 addresses), 2) word-based line-breaking opportunities (which hyphenation dictionaries and patterns address), and 3) some strategy for using these to find acceptable/optimal line breaks. It sounds like you have addressed at least 1 and 3 in your implementation.
Paragraph filling (your point 3) is not addressed. Be careful with the various TRs: UTR14 does not deal with character (rather: grapheme) or word boundaries, that's UTX-29. Actually, we don't use the latter. Our line breaking should probably be done the following way (this implements the "naive" paragraph filling strategy) loop calculate line width if next character is added check for a line breaking opportunity before the next character if there is an opportunity if the line is not full discard the last saved opportunity and save this else try hyphenation on the string accumulated since the last break opportunity (if enabled), save returned opportunity if any return saved line breaking opportunity end if end if end loop
hyphenation of a string: loop skip non-word characters (for this hyphenator) word = continuous run of word characters (for this hyphenator) if the end of the word is past the end of the line try hyphenating the word, generate new break opportunities return best fitting line break opportunity or null end if end loop
There is the degenerate case if the line overflows and no line break opportunity is discovered at all. The TeX paragraph filling strategy has to detect line break opportunities the same way but selects the opportunities turning into actual line breaks in a more clever way. We could do that too.
This seems at least remotely related to fo.FOText.isWordChar(), which attempts to find breaks between words.
Actually, we don't need breaks between words. We need identifying line breaking opportunities, words for the purpose of hyphenation, and resizable spaces for justification. That's why WordArea was such a bad name.
J.Pietschmann