Hi all,

My 2 cents, as I don't have a detailed understanding yet of all the
issues raised here.

Globally we agree that the line- and page-breaking algorithm should be
modified to exchange some information. Now I have the feeling that
backtracking is actually not necessary.

I've put some words about that in my GSoC wiki page about side-floats.
There are also Simon's comments in a preceding thread [1]. Line-breaking
could be somehow driven by page-breaking, both being done in the same
time. At each iteration of line-breaking, the currently considered page
context may be passed to the line-breaking algorithm; roughly:

create a node for page 0, line 0
for each legal linebreak do
   for each active node do
        considerLegalBreak(linebreak, page context of the active node)
        if this is a feasible line break
            record an active node, line level
            if this is a feasible page break
                record an active node, page level

In considerLegalBreak, we would have all the necessary informations from
the current page-level active node:
- would this line be the last line of the page/column? Then if the
 current legal linebreak is a hyphen it doesn't make a feasible
- this would be the last line of the last column and there is a
 keep-together.within-page? Then no feasible page-break

Depending on the current active node, a legal linebreak could be the
last line of page n, in which case the ipd of page n is to be
considered; or it could be the first line of page n+1, and then we must
take the ipd of page n+1.

We could modulate the degree of total-fit we want: for a real true
total-fit we keep the active-nodes for the whole document. Or each time
the end of a page-sequence is reached, we stop the algorithm, chose the
current best layout (and can start creating the areas), and restart from
scratch at the next page sequence. Or we do that each time a forced
page-break is met.

We could choose to reset the line-level active nodes at the end of each
paragraph, and choose the number of lines leading to the optimal layout
for that paragraph (this is the current situation). Or, instead, just
select the best active nodes for each possible number of lines, and
discard the other; so there would usually be three active nodes for a
paragraph instead of one currently.

We could, each time a feasible page break is found, record it only if
its demerits are lower than those of the currently recorded page break
for the same number of pages (page-level best-fit). We could also do
that for paragraphs (line-level best-fit, for very simple documents).

So, in my opinion, and with the still limited knowledge I have of some
layout problems (balanced columns, several spans for a page...), this
should be just a matter of passing the right informations to the
line-breaking algorithm, and record them in active nodes.

Hope that can give you further ideas,


2006/8/31, Jeremias Maerki:
I'm investigating what would be necessary to implement hyphenation-keep.
After some thought, I think this is one of those very mean properties
that fire back from page-breaking back into line-breaking. IOW, when you
detect a page/column break at a line which is hyphenated you'll
basically have to track back and redo the line breaking, disabling that
particular hyphenation possibility. You then have to redo the page
breaking possibly having to backtrack again if another hyphenated line
is again at the end of a column/page. Doesn't sound like a small change.

The cheap way, of course, is to add penalty values to discourage page
breaks between hyphenated lines (when hyphenation-keep is activated) but
that could lead to ugly layout. It's certainly better to disable certain
hyphenation points based on feedback from page breaking but it obviously
means starting to backtrack into line breaking. Maybe the "changing
available IPD" problem also plays into this. As we've seen, it may be
necessary to redo certain line breaks based on events in page breaking.

Does anyone see a relatively simple way I have not yet seen? Or am I
more or less on track?

Another topic that we may have to adress at some point is the
distinction of keeps on column level and keeps on page level. So far, we
can only map the keeps on column level. I wonder how we would go about
an implementation here. It seems to me that the page breaker would have
to start being more clever.

Anyway, the important thing for me right now is to have an idea how
hyphenation-keep would have to be implemented so I can take an estimate
and determine dependencies of tasks.

Thanks for any ideas,
Jeremias Maerki

Reply via email to