Hi Luca,

Luca Furini a écrit :
<snip/>
> 1) TextLM breaks the text even when a "/" or a "-" is found, handling
> them as hyphenation points with the usual sequence of glue + penalty +
> glue elements.
> 
> The LineLM tries, in the first instance, to avoid using hyphenation
> points, so the penalty is not taken into account. But this has the side
> effect of using the first glue element as a feasible break (if the
> penalty were a feasible break too, it would surely be a better one, such
> avoiding the glue to be effectively chosen).

I don't follow you: IIUC the glue-penalty-glue triplet is generated only
the second time, when the first breaking doesn't give acceptable
results? What do you mean by "the penalty is not taken into account"?

Also, I don't see why the penalty would be preferred over the glue, as
it has a positive penalty value.


> This is probably the smaller of the problems, and can be solved just
> adding an infinite penalty before the first glue element. But maybe we

This seems to be a good idea, anyway.


> want to prevent this breaking to happen, as we can now use
> zero-width-spaces to explicitly insert breaking positions?

Good point. I'd say yes for '/'. This would add a burden to the user who
would have to modify the FO generation step to add ZWSP for URLs or
filenames; but we must also take into account cases where the user does
/not/ want the word to be split at '/' characters.
For hyphens, I would keep the current behavior, as this is the most
expected one IMO. And it can also be prevented by adding non-breaking
zero-width space.


> 2) The presence of an inline object larger that the available width
> makes the algorithm to deactivate all the active nodes and then restart
> with a "second-hand" node, as no line can be built that does not
> overflow. The restarting node was chosen, in
> BreakingAlgorithm.findBreakingPoints(), between lastTooShort and
> lastTooLong, neither of them being a "good" breaking point. There is a
> lastDeactivated node chosen among the deactivated nodes but it was not
> used.
> 
> A deactivated node previously was an active one, so it is surely better
> than a node who "failed to qualify"; replacing either lastTooShort or
> lastTooLong (according to the adjustment) with lastDeactivated leads to
> a better set of breaks. However, this in not enough. The attached file
> small.20.pdf shows the result after fixing these first two problems.
> 
> 
> 3) At the moment, the LineLM can call findBreakingPoints() up to three
> times, the last one with a maximum adjusting ratio equal to 20. I came
> to the conclusion that this is really TOO much. I tried stopping after
> the second call (with max ratio = 5) and the result is much better (see
> attached file small.5.pdf).

Yes 20 is probably too much. We need perhaps to also differentiate the
case where no acceptable line-breaking can be found because of a box too
long to even fit alone on one line. In such a case even a very high max
ratio won't help.


> A high maximum adjustment ratio means that the algorithm is allowed to
> stretch spaces a lot in order to find a set of breaks which is
> *globally* better; this means that it can choose some not-so-beautiful
> breaks in order to build a set spanning over a larger portion of the
> paragraph.
> 
> In our example: there can be a break just before the long url (a line
> ending after "Consider:") only if we use an enormous adjustment ratio.
> With a smaller, more appropriate threshold, "Consider:" can no more end
> a line, so the algorithm will restart from a previous point.
> 
> 
> In conclusion: the first two items are easily fixed, and I'm going to
> commit the changes in the afternoon (in there are no objections);
> concerning the question of the automatic break at "/-" characters, I'll
> probably leave the code unchaged for the moment, until we decide what is
> best.
> 
> Concerning point #3, I'm going to have a closer look at the restarting
> mechanism ...

Yes, the current mechanism doesn't seem to be good enough, but I'm
wondering if we can find a better one. Currently a too-short/too-long
node replaces another one if it has fewer demerits. The number of
lines/pages handled so far isn't taken into account. So this is likely
that a too-short/too-long node ending an earlier line/page will be
preferred over a node going further in the Knuth sequence. Why should
that be the case?
In fact the main problem I think is to find the right heuristic to
select too-short/too-long nodes, in order to end up with the most
acceptable result. Easy to say...

Also, may I suggest you to look at the Temp_Floats branch, and perhaps
even working on it instead of trunk? I've made quite heavy changes to
the breaking code that might be difficult to merge back into the trunk
if there are also changes there.


Cheers,
Vincent

Reply via email to