Simon Pepping wrote:
I do not have much time to look into your problem, so I will just try to give a quick answer.
In my view the current BP setup is not able to generate good page break decisions. It only can do a first-fit algorithm. From your account, BPs are also overloaded to signal the completion of a page while they do not really end an area. Your hack is a hack indeed, but from a quick inspection I would say that it properly marks the overloaded nature of BPs.
I have written down a proposal for a different strategy of page break decision. I put my description on the wiki, http://wiki.apache.org/xmlgraphics-fop/PageLayout. I believe it serves two goals: 1. Enabling smarter page break algorithms. 2. Simplifying the addAreas calls, and esp. its iteration over the collected BPs.
I have not had time to implement this, and therefore also no time to detect the flaws in my proposal. I would not mind if someone else would implement it.
Using your description as a jumping point, here is my ideas for page breaking. I suppose it is even more pie-in-the-sky since I haven't yet written anything about it.
The algorithm that the PageLM uses are a slightly modified knuth (no need to maintain fitnessclass, and with the ability to decide on a break when there is N pages of lookahead). The elements return from the LMs are boxes (for lines), spaces and penalties. The elements are not returned from the LMs but pushed from the LM into the pageLM:
parent.generateElement(new Space(resolveBefore()); parent.generateElement(new Box(lineHeigth);
The LMs also push start and end markers so that the order list of elements in the pageLM is actually a flattened tree and can be used directly for creation of areas (so no more Position tree). The exact same flattened tree is applied to inline.
The element are pushed to the pageLM during a non-recursive traversal of the LM-tree. The areas are created during a non-recursive traversal of the flattened elements tree. During area creation the parent LMs is kept in a external stack, at the end of page the stack represent the areas that is continued on the next page and stack is used as the starting point for creatin
A significant drawback is that a knuth based page break algorithm is difficult to explain and justify, just like it is difficult to explain why line breaking knuth do as it do by looking at individual lines in the output.