On Jan 17, 2008, at 20:57, Simon Pepping wrote:
On Thu, Jan 17, 2008 at 12:27:11AM +0100, Andreas L Delmelle wrote:
Right now, the element list is constructed as the result of
recursive calls
to getNextChildLM.getNextKnuthElements().
/The/ return list upon which the page breaker operates is the one
that is
ultimately returned by the FlowLM.
Instead of that, I've been thinking in the direction of making it
a data
structure that exists 'physically' separate from the LMs.
This structure, created and maintained by the PageSequenceLM,
would be
passed down into an appendNextKnuthElementsTo() method.
The lower-level LMs can signal an interrupt to the ancestor LMs,
based on
information they get through the LayoutContext --forced breaks
being the
most prominent.
The FlowLM, instead of simply continuing the loop, could give
control back
to the PageSequenceLM, which can run the page breaker over the
list up to
that point.
I would rather pass a reference to the page breaker in the
getNextKnuthElements call. Each LM can then append Knuth elements in a
callback to the pagebreaker. At each such append callback, the page
breaker can decide to run the Knuth algorithm and ship pages. When
this callback finishes, the LM can continue. Running the Knuth
algorithm intermittently makes no sense in a total-fit algorithm.
Right. Running the algorithm intermittently may make no sense /in/ a
total-fit algorithm, the implementation of which currently takes for
granted the fact that a given sequence S will always be complete, so
we can immediately move on from end-of-layout to building the area
tree for the page-sequence. Suppose, however, that this will no
longer be guaranteed.
Also, I would see the basic strategy evolve in such a way that we
have some check on the list's eventual size to determine whether to
use best-fit or total-fit. Implementing this logic inside the LMs or
the breaking-algorithm seems out-of-place. As Jeremias mentioned, we
would have to have some mechanism for limiting memory consumption.
Keeping total-fit as the default strategy for the layout-engine is
fine by me, as long as we also can switch to best-fit at the
appropriate point. This point is unrelated to anything layout-
specific, so I was thinking that a separate data structure would help
make the implementation of such a mechanism much cleaner. If this
check becomes part of each childLM's getNextElements(), this might
turn out to become a pain to maintain... If we implement it as part
of the add() and remove() of that hypothetical structure, it remains
nicely separated from both the LM-logic and the breaking-algorithm.
Two places where it does not belong.
We would also need to be able to determine the likelihood of the
computed
page-breaks changing due to additional content, if the FlowLM stil
has
childLMs coming up.
The Knuth algorithm does not determine actual pagebreaks but feasible
pagebreaks. Feasible pagebreaks are only determined by the preceding
Knuth sequence. Following Knuth elements can contribute more feasible
pagebreaks, but they can not change already determined feasible
pagebreaks.
Only at the appropriate time is the best feasible
pagebreak selected, and are the actual pagebreaks determined. The
appropriate time is the end of the sequence in a total-fit
strategy. In the best-fit strategy it is the point where no further
feasible pagebreaks for the current page can be contributed.
Hmm... my bad, I think I expressed it incorrectly.
My interpretation is that, given incomplete sequences and
intermittent runs, the score/demerits for each feasible break (its
'feasibility') determined by a prior run could still change due to
additional childLMs having contributed more elements in the next.
There is no strict "point" where we switch from determining feasible
breaks to computing actual ones: the feasible breaks become 'more'
actual with each progression.
Basically, it's about questions like:
What happens if we give the algorithm a sequence S, and subsequently
a different sequence S' which actually is S plus some more elements
added to the tail?
How are the resulting feasible breaks influenced if the total-fit
algorithm is run first over S, then over S'?
Is there a basis for saying: "The feasible break at position P in S
is more/less likely to be considered as an actual break, when you
compare the results for the same sequence, with some more content
added, given the same context."
I have always somehow assumed there to be a threshold in the most
common cases. In the sense that certain (maybe in fact even the bulk
of) feasible breaks can already be excluded quite early, even if the
sequence consists of millions of elements. Simply because they would,
in intermittent runs, repeatedly be considered to be 'bad' breaks, or
because they become 'worse' than the previous run.
Something like this happens already, in a way, I think, albeit very
deeply embedded inside the algorithm, and a bit late too...
Once we start determining actual breaks, the further we progress in
the page-sequence, the more layout-possibilities that need to be
excluded. We simply cannot consider all of them, including the ones
that would lead to unreasonable results.
Yet another way to look at it: the result of a total-fit strategy is
the best-fit for the entire page-sequence, which can be viewed as the
result of multiple total-fits for subsets of the page-sequence,
especially where forced page-breaks are involved. If we place a
forced break-after="page" on the first block in a document, the total-
fit result for the first page can in general be determined long
before knowing anything else about the document.
Cheers
Andreas