Re: page-breaking strategies and performance

Chris Bowditch Wed, 02 Mar 2005 03:29:02 -0800

Jeremias Maerki wrote:

Hi Jeremias,

I finally have Knuth's "Digital Typography" and let myself enlighten by
his well-written words. In [1] Simon outlined different strategies for
page-breaking, obviously closely following the different approaches
defined by Knuth. At first glance, I'd say that "best-fit" is probably
the obvious strategy to select, especially if TeX is happy with it.
Obviously, it can't find the optimal solution like this but the
additional overhead (memory and CPU power) of a look-ahead/total-fit
strategy is simply too much and unnecessary for things like invoices and
insurance policies which are surely some of the most popular use cases
of XSL-FO. Here, speed is extremely important. People writing
documentation (maybe using DocBook) or glossy stock reports have
additional requirements and don't mind the longer processing time and
additional memory requirements. This leads me to the question if we
shouldn't actually implement two page-breaking strategies (in the end,
not both right now). For a speed-optimized algorithm, we could even
think about ignoring side-floats.

We have dozens of customers using an XSL-FO solution and I can confirm invoices and insurance policies are a common use case for XSL-FO. A lot of companies have performance as a priority and we have no one using side floats or even thinking about using them, so optimizing for speed by ignoring side floats sounds like a good idea! But this is just my 2 cents and may conflict with other people's wishes.


Obviously, in this model we would have to make sure that we use a common
model for both strategies. For example, we still have to make sure that
the line layout gets information on the available IPD on each line, but
probably this will not be a big problem to include later.

An enhanced/adjusted box/glue/penalty model sounds like a good idea to
me especially since Knuth hints at that in his book, too. There's also a
question if part of the infrastructure from line breaking can be reused
for page breaking, but I guess rather not.

Probably best to re-create an algorithm from scratch for page breaking but line breaking can be reviewed for ideas.


As for the plan to implement a new page-breaking mechanism: I've got to
do it now. :-) I'm sorry if this may put some pressure on some of you.
I'm also not sure if I'm fit already to tackle it, but I've got to
do it anyway. Since I don't want to work with a series of patches like
you guys did earlier, I'd like to create a branch to do that on as soon
as we've agreed on a strategy. Any objections to that?

If we are going to branch the code for this then we need to make sure we have a plan to merge the branch back once we are confident in the new page breaking algorithm. This plan (which should be agreed before branching takes place) should include an acceptance procedure, e.g. will a single -1 be able to prevent the code being merged back? We dont want to end up with another alt-design, which eventually moved to source forge!!!

Chris

Re: page-breaking strategies and performance

Reply via email to