Hi people,
I know I've mentioned it a few times before, so here's another
attempt at brainstorming about possible improvements in interaction
between fotree and layoutengine.
(So, obviously what follows does not apply to the output formats that
bypass the layoutengine, like RTF)
In short, it concerns two aspects of this interaction that, when
revisited, could lead to a much faster and less memory-hungry FOP.
At least, that's the idea. I have read some on Java performance
tuning, but would hardly call myself an expert on the matter...
The first would be FOP-internal multi-threading, which is already
possible at the higher-level (and relatively easy to implement, IIC).
Right now, we do roughly, single-threaded:
a) catch SAX events, and build a sort of FOP-DOM (the FOTree) until
we reach the end of a page-sequence
b) perform layout for the entire page-sequence
c) create the area tree, and render it
After c) we go back to a), while that process could have easily
continued by starting b) in a separate thread, at least for the first
a).
Hope you can follow... :/
As Chris recently mentioned, forcing this /would/ make FOP non-
compliant to the J2EE Bean standard, so this could probably be made
part of the configuration somehow (and, leaping to Adrian's recent
post on fop-users@: this could then be specified in the XSLT).
At first glance, by itself, this would make FOP actually more memory-
hungry than it currently is. Imagine a handful of 20-page sequences
being processed concurrently, and consider the number of objects
involved if we're talking about lots of small blocks, inlines, tables
etc.
So, on top of that, I'm thinking of making b) less of a monolithic
process.
At the moment, we always wait for an endPageSequence() call on the
AreaTreeHandler, which works fine for small to medium-sized page-
sequences, but is definitely not scaleable to larger ones consisting
of a lot of FOs. I think we should take a look at implementing endFlow
(), for instance, or startFlow(). At those points, we are already
guaranteed to have at least the part of the FOTree that is necessary
to perform some basic preliminary layout (the *ahem* Pagination and
Layout FOs).
Other handlers that could turn out to be interesting to implement
(but I'm guessing this to be interesting only for the flow, not for
the static-content):
* endExternalGraphic() / endInstreamForeignObject()
* endInlineContainer() [... ;-) ...] / endBlockContainer()
* endInline() / endBlock()
In fact, there are a whole lot more. The idea is obviously not to
have the PageSequenceLM run over the entire page-sequence multiple
times, but to have the next childLM continue where the last one left
off. If the area addition is started in yet another thread, I think,
it would even become possible to release/GC parts of the FOTree (have
the LMs dereference their FO) long before we even reach the first
endPageSequence() during parsing.
I'm not entirely sure yet, but I have a vague feeling that Simon's
Interleaved_Page_Line_Breaking branch will be quite beneficial in
getting this right (or may even be the key to making the whole thing
feasible in the first place).
If anyone has specific ideas/thoughts in this area (or questions),
I'm definitely interested in anything you see in the code that I may
have missed.
The goal is of course, as always: to have FOP format 20 copies of the
Encyclopedia Brittannica at the same time without exceeding the 64MB
limit... ;-)
Cheers
Andreas