Chuck Bearden wrote:

If in a left-aligned block some typical text words are followed by a string
longer than the line-length and containing no spaces (e.g. a long URL), then the
foregoing text will have premature line breaks, i.e. halfway to two-thirds the
way into the line.

I had a look at this, and what I found out is that the strange-looking lines are the combined effect of three different problems. So, sorry in advance for the long post, but breaking is never an easy matter! :-)


1) TextLM breaks the text even when a "/" or a "-" is found, handling them as hyphenation points with the usual sequence of glue + penalty + glue elements.

The LineLM tries, in the first instance, to avoid using hyphenation points, so the penalty is not taken into account. But this has the side effect of using the first glue element as a feasible break (if the penalty were a feasible break too, it would surely be a better one, such avoiding the glue to be effectively chosen).

This is probably the smaller of the problems, and can be solved just adding an infinite penalty before the first glue element. But maybe we want to prevent this breaking to happen, as we can now use zero-width-spaces to explicitly insert breaking positions?


2) The presence of an inline object larger that the available width makes the algorithm to deactivate all the active nodes and then restart with a "second-hand" node, as no line can be built that does not overflow. The restarting node was chosen, in BreakingAlgorithm.findBreakingPoints(), between lastTooShort and lastTooLong, neither of them being a "good" breaking point. There is a lastDeactivated node chosen among the deactivated nodes but it was not used.

A deactivated node previously was an active one, so it is surely better than a node who "failed to qualify"; replacing either lastTooShort or lastTooLong (according to the adjustment) with lastDeactivated leads to a better set of breaks. However, this in not enough. The attached file small.20.pdf shows the result after fixing these first two problems.


3) At the moment, the LineLM can call findBreakingPoints() up to three times, the last one with a maximum adjusting ratio equal to 20. I came to the conclusion that this is really TOO much. I tried stopping after the second call (with max ratio = 5) and the result is much better (see attached file small.5.pdf).

A high maximum adjustment ratio means that the algorithm is allowed to stretch spaces a lot in order to find a set of breaks which is *globally* better; this means that it can choose some not-so-beautiful breaks in order to build a set spanning over a larger portion of the paragraph.

In our example: there can be a break just before the long url (a line ending after "Consider:") only if we use an enormous adjustment ratio. With a smaller, more appropriate threshold, "Consider:" can no more end a line, so the algorithm will restart from a previous point.


In conclusion: the first two items are easily fixed, and I'm going to commit the changes in the afternoon (in there are no objections); concerning the question of the automatic break at "/-" characters, I'll probably leave the code unchaged for the moment, until we decide what is best.

Concerning point #3, I'm going to have a closer look at the restarting mechanism ...

Regards
    Luca

Attachment: small.20.pdf
Description: Adobe PDF document

Attachment: small.5.pdf
Description: Adobe PDF document

Reply via email to