On Jan 17, 2008, at 20:57, Simon Pepping wrote:

On Thu, Jan 17, 2008 at 12:27:11AM +0100, Andreas L Delmelle wrote:
Right now, the element list is constructed as the result of recursive calls
to getNextChildLM.getNextKnuthElements().
/The/ return list upon which the page breaker operates is the one that is
ultimately returned by the FlowLM.

Instead of that, I've been thinking in the direction of making it a data
structure that exists 'physically' separate from the LMs.
This structure, created and maintained by the PageSequenceLM, would be
passed down into an appendNextKnuthElementsTo() method.

The lower-level LMs can signal an interrupt to the ancestor LMs, based on information they get through the LayoutContext --forced breaks being the
most prominent.
The FlowLM, instead of simply continuing the loop, could give control back to the PageSequenceLM, which can run the page breaker over the list up to
that point.

I would rather pass a reference to the page breaker in the
getNextKnuthElements call. Each LM can then append Knuth elements in a
callback to the pagebreaker. At each such append callback, the page
breaker can decide to run the Knuth algorithm and ship pages. When
this callback finishes, the LM can continue. Running the Knuth
algorithm intermittently makes no sense in a total-fit algorithm.

Right. Running the algorithm intermittently may make no sense /in/ a total-fit algorithm, the implementation of which currently takes for granted the fact that a given sequence S will always be complete, so we can immediately move on from end-of-layout to building the area tree for the page-sequence. Suppose, however, that this will no longer be guaranteed.

Also, I would see the basic strategy evolve in such a way that we have some check on the list's eventual size to determine whether to use best-fit or total-fit. Implementing this logic inside the LMs or the breaking-algorithm seems out-of-place. As Jeremias mentioned, we would have to have some mechanism for limiting memory consumption. Keeping total-fit as the default strategy for the layout-engine is fine by me, as long as we also can switch to best-fit at the appropriate point. This point is unrelated to anything layout- specific, so I was thinking that a separate data structure would help make the implementation of such a mechanism much cleaner. If this check becomes part of each childLM's getNextElements(), this might turn out to become a pain to maintain... If we implement it as part of the add() and remove() of that hypothetical structure, it remains nicely separated from both the LM-logic and the breaking-algorithm. Two places where it does not belong.


We would also need to be able to determine the likelihood of the computed page-breaks changing due to additional content, if the FlowLM stil has
childLMs coming up.

The Knuth algorithm does not determine actual pagebreaks but feasible
pagebreaks. Feasible pagebreaks are only determined by the preceding
Knuth sequence. Following Knuth elements can contribute more feasible
pagebreaks, but they can not change already determined feasible
pagebreaks.

Only at the appropriate time is the best feasible
pagebreak selected, and are the actual pagebreaks determined. The
appropriate time is the end of the sequence in a total-fit
strategy. In the best-fit strategy it is the point where no further
feasible pagebreaks for the current page can be contributed.


Hmm... my bad, I think I expressed it incorrectly.
My interpretation is that, given incomplete sequences and intermittent runs, the score/demerits for each feasible break (its 'feasibility') determined by a prior run could still change due to additional childLMs having contributed more elements in the next.

There is no strict "point" where we switch from determining feasible breaks to computing actual ones: the feasible breaks become 'more' actual with each progression.

Basically, it's about questions like:
What happens if we give the algorithm a sequence S, and subsequently a different sequence S' which actually is S plus some more elements added to the tail? How are the resulting feasible breaks influenced if the total-fit algorithm is run first over S, then over S'? Is there a basis for saying: "The feasible break at position P in S is more/less likely to be considered as an actual break, when you compare the results for the same sequence, with some more content added, given the same context."

I have always somehow assumed there to be a threshold in the most common cases. In the sense that certain (maybe in fact even the bulk of) feasible breaks can already be excluded quite early, even if the sequence consists of millions of elements. Simply because they would, in intermittent runs, repeatedly be considered to be 'bad' breaks, or because they become 'worse' than the previous run. Something like this happens already, in a way, I think, albeit very deeply embedded inside the algorithm, and a bit late too... Once we start determining actual breaks, the further we progress in the page-sequence, the more layout-possibilities that need to be excluded. We simply cannot consider all of them, including the ones that would lead to unreasonable results.

Yet another way to look at it: the result of a total-fit strategy is the best-fit for the entire page-sequence, which can be viewed as the result of multiple total-fits for subsets of the page-sequence, especially where forced page-breaks are involved. If we place a forced break-after="page" on the first block in a document, the total- fit result for the first page can in general be determined long before knowing anything else about the document.


Cheers

Andreas

Reply via email to