On Jan 5, 2007, at 16:20, Jeremias Maerki wrote:
Adding page breaks will not be enough, BTW. But you already noticed
that.
FOP can currently only release memory at the end of a page-
sequence. So
instead of creating page-breaks, try to restart a new page-
sequence. The
memory usage should drop considerably.
If I remember correctly, that was precisely the problem, since
Cliff's report consists of one giant table. It's supposed to look
like one uninterrupted flow, so figuring out where the page-sequences
should end is next to impossible... (or IOW: sorting that out kind of
defeats the purpose of using a formatter to compute the page-breaks) :/
There's also a little class (CachedRenderPagesModel) which could
theoretically be used instead of the default RenderPagesModel. It
allows
to temporarily off-load rendered pages to disk if they can't be
rendered
right away. But this is not actively tested and does not help with the
memory consumption of the FO tree which probably is representing the
largest part in your case.
The one way I see that FOP is ever going to get close to resolving
the issue of arbitrarily sized page-sequences, is if the overall
processing is 'slightly' modified (quoted, since it seems like only a
small change, but it would still be quite some work for one man).
The redesign was ultimately meant to modularize FOP. Now the fo-tree
and the layoutengine have been successfully extracted into separate
modules, seems like it's time to revisit the way they work together.
Currently, we have two monolithic modules performing their respective
operations in sequential order. One module (layout) can't start until
the other (fo-tree) has reached a critical boundary
(FOEventHandler.endPageSequence()), and vice versa, the fo-tree can't
continue until layout for a page-sequence has finished.
Very briefly put: the key would be to implement
AreaTreeHandler.endBlock().
Use that event to start/resume the layout-loop (ideally this loop
should run in a separate thread, so there would be real performance-
boosts on MP-systems), and use endPageSequence() instead only to
perform one finishing pass over the whole sequence.
Such a change could bring us closer to enhancing FOP in other areas
as well.
Multiple endBlock() events each offer an opportunity for the
PageSequenceLM to record available IPD changes, take into account
footnotes/floats associated with a block etc.
Rough sketch:
At the very first endBlock() the parent FlowLM and PageSequenceLM are
instantiated, and the first block-sequence is created. The breaker is
run a first time, storing the resulting active nodes.
Every next occurrence of the event, the ancestor LMs and a set of
active nodes are already present, a sequence for the current block is
added, and the breaker is run again...
As such, the page-breaking algorithm would run incrementally,
performing multiple passes over the same block-sequences.
As you can see from the simplistic sketch, I'm still a bit unsure
about the specifics, but if all goes well, in the most
straightforward cases, some LMs can begin adding their areas long
before the physical end-of-page-sequence is reached. If that also
implies they can release the reference to their FO (and instruct the
FOTree to release the reference as well via FONode.removeChild()),
large parts of the FOTree can be garbage-collected much sooner than
they are now.
Think of the content of block-containers, non-marker parts of the
static-content, table-headers/-footers. Even large text-blocks: note
that the TextLM currently creates a copy of the corresponding
FOText's char array, while the original happily occupies the same
amount of memory.
The overall changes would be far from trivial though, AFAICT, but I'd
love to see some more brainstorming in this direction. Biggest
problem, IIC, is that AbstractBreaker.doLayout() currently performs
everything in one go.
Cheers,
Andreas