On Jan 5, 2007, at 16:20, Jeremias Maerki wrote:

Adding page breaks will not be enough, BTW. But you already noticed that. FOP can currently only release memory at the end of a page- sequence. So instead of creating page-breaks, try to restart a new page- sequence. The
memory usage should drop considerably.

If I remember correctly, that was precisely the problem, since Cliff's report consists of one giant table. It's supposed to look like one uninterrupted flow, so figuring out where the page-sequences should end is next to impossible... (or IOW: sorting that out kind of defeats the purpose of using a formatter to compute the page-breaks) :/

There's also a little class (CachedRenderPagesModel) which could
theoretically be used instead of the default RenderPagesModel. It allows to temporarily off-load rendered pages to disk if they can't be rendered
right away. But this is not actively tested and does not help with the
memory consumption of the FO tree which probably is representing the
largest part in your case.

The one way I see that FOP is ever going to get close to resolving the issue of arbitrarily sized page-sequences, is if the overall processing is 'slightly' modified (quoted, since it seems like only a small change, but it would still be quite some work for one man).

The redesign was ultimately meant to modularize FOP. Now the fo-tree and the layoutengine have been successfully extracted into separate modules, seems like it's time to revisit the way they work together. Currently, we have two monolithic modules performing their respective operations in sequential order. One module (layout) can't start until the other (fo-tree) has reached a critical boundary (FOEventHandler.endPageSequence()), and vice versa, the fo-tree can't continue until layout for a page-sequence has finished.

Very briefly put: the key would be to implement AreaTreeHandler.endBlock(). Use that event to start/resume the layout-loop (ideally this loop should run in a separate thread, so there would be real performance- boosts on MP-systems), and use endPageSequence() instead only to perform one finishing pass over the whole sequence.

Such a change could bring us closer to enhancing FOP in other areas as well. Multiple endBlock() events each offer an opportunity for the PageSequenceLM to record available IPD changes, take into account footnotes/floats associated with a block etc.

Rough sketch:
At the very first endBlock() the parent FlowLM and PageSequenceLM are instantiated, and the first block-sequence is created. The breaker is run a first time, storing the resulting active nodes. Every next occurrence of the event, the ancestor LMs and a set of active nodes are already present, a sequence for the current block is added, and the breaker is run again... As such, the page-breaking algorithm would run incrementally, performing multiple passes over the same block-sequences.

As you can see from the simplistic sketch, I'm still a bit unsure about the specifics, but if all goes well, in the most straightforward cases, some LMs can begin adding their areas long before the physical end-of-page-sequence is reached. If that also implies they can release the reference to their FO (and instruct the FOTree to release the reference as well via FONode.removeChild()), large parts of the FOTree can be garbage-collected much sooner than they are now. Think of the content of block-containers, non-marker parts of the static-content, table-headers/-footers. Even large text-blocks: note that the TextLM currently creates a copy of the corresponding FOText's char array, while the original happily occupies the same amount of memory.

The overall changes would be far from trivial though, AFAICT, but I'd love to see some more brainstorming in this direction. Biggest problem, IIC, is that AbstractBreaker.doLayout() currently performs everything in one go.



Cheers,

Andreas

Reply via email to