Vincent wrote:
The LineLM tries, in the first instance, to avoid using hyphenation
points, so the penalty is not taken into account. But this has the side
effect of using the first glue element as a feasible break (if the
penalty were a feasible break too, it would surely be a better one, such
avoiding the glue to be effectively chosen).
I don't follow you: IIUC the glue-penalty-glue triplet is generated only
the second time, when the first breaking doesn't give acceptable
results? What do you mean by "the penalty is not taken into account"?
No, the sequence is always the same: since the beginning it represents the
hyphenation points too, but at the first call to findBreakingPoints()
there is a parameter saying that only non-hyphenated breaks should be
looked at.
Also, I don't see why the penalty would be preferred over the glue, as
it has a positive penalty value.
Choosing the glue as a break has the effect of losing its stretch and
shrink, so the adjust ratio and the demerits would be higher. Now you make
me think of it, this is surely true when the penalty has penalty value =
0, but could be false otherwise ... so we could check the penalty value
and add the additional penalty if it's >0.
want to prevent this breaking to happen, as we can now use
zero-width-spaces to explicitly insert breaking positions?
Good point. I'd say yes for '/'. This would add a burden to the user who
would have to modify the FO generation step to add ZWSP for URLs or
filenames; but we must also take into account cases where the user does
/not/ want the word to be split at '/' characters.
Ok
For hyphens, I would keep the current behavior, as this is the most
expected one IMO. And it can also be prevented by adding non-breaking
zero-width space.
I'm afraid that, at the moment, a zero-width non-breaking space after a
"-" would not prevent the break to happen ... and it would not be
completely trivial to handle it (as the hyphen could be the last character
of an inline, and the zwnbsp the first of another one.
Maybe we could move the handling of hyphen to the hyphenation phase, when
text is collected from all inline LMs.
Another question: should the hyphen characters in the text be feasible
breaks even if hyphenation is disabled?
At the moment, hyphen characters and hyphenation points are handled in the
same way, so a hyphen in the text could be a break only if hyphenate=true,
and only since the second call to findBreakingPoints().
Yes 20 is probably too much. We need perhaps to also differentiate the
case where no acceptable line-breaking can be found because of a box too
long to even fit alone on one line. In such a case even a very high max
ratio won't help.
I agree.
I'm thinking how this could be done in an efficient way: lowering the
threshold during the execution of the breaking algorithm is not this
simple (pruning the list of active nodes could be not enough as it could
become empty).
Yes, the current mechanism doesn't seem to be good enough, but I'm
wondering if we can find a better one. Currently a too-short/too-long
node replaces another one if it has fewer demerits. The number of
lines/pages handled so far isn't taken into account. So this is likely
that a too-short/too-long node ending an earlier line/page will be
preferred over a node going further in the Knuth sequence. Why should
that be the case?
In fact the main problem I think is to find the right heuristic to
select too-short/too-long nodes, in order to end up with the most
acceptable result. Easy to say...
The use of lastDeactivated should lead to some improvements:
lastDeactivated is (already) updated using the compareNodes() method,
which compares the node position first, and then the demerits.
At the moment, my understanding of this matter is this: lastTooShort and
lastTooLong should be used only when the algorithm couldn't find any good
break since the last restart; otherwise lastDeactivated is probably the
best restarting point, as it allows the creation of a few good lines /
pages.
Also, may I suggest you to look at the Temp_Floats branch, and perhaps
even working on it instead of trunk? I've made quite heavy changes to
the breaking code that might be difficult to merge back into the trunk
if there are also changes there.
Oops, you are right!
I'm going to look at it and work on it.
Regards
Luca