Hi people,

I know I've mentioned it a few times before, so here's another attempt at brainstorming about possible improvements in interaction between fotree and layoutengine. (So, obviously what follows does not apply to the output formats that bypass the layoutengine, like RTF)

In short, it concerns two aspects of this interaction that, when revisited, could lead to a much faster and less memory-hungry FOP. At least, that's the idea. I have read some on Java performance tuning, but would hardly call myself an expert on the matter...

The first would be FOP-internal multi-threading, which is already possible at the higher-level (and relatively easy to implement, IIC).
Right now, we do roughly, single-threaded:
a) catch SAX events, and build a sort of FOP-DOM (the FOTree) until we reach the end of a page-sequence
b) perform layout for the entire page-sequence
c) create the area tree, and render it

After c) we go back to a), while that process could have easily continued by starting b) in a separate thread, at least for the first a).

Hope you can follow... :/

As Chris recently mentioned, forcing this /would/ make FOP non- compliant to the J2EE Bean standard, so this could probably be made part of the configuration somehow (and, leaping to Adrian's recent post on fop-users@: this could then be specified in the XSLT).

At first glance, by itself, this would make FOP actually more memory- hungry than it currently is. Imagine a handful of 20-page sequences being processed concurrently, and consider the number of objects involved if we're talking about lots of small blocks, inlines, tables etc.

So, on top of that, I'm thinking of making b) less of a monolithic process. At the moment, we always wait for an endPageSequence() call on the AreaTreeHandler, which works fine for small to medium-sized page- sequences, but is definitely not scaleable to larger ones consisting of a lot of FOs. I think we should take a look at implementing endFlow (), for instance, or startFlow(). At those points, we are already guaranteed to have at least the part of the FOTree that is necessary to perform some basic preliminary layout (the *ahem* Pagination and Layout FOs).

Other handlers that could turn out to be interesting to implement (but I'm guessing this to be interesting only for the flow, not for the static-content):
* endExternalGraphic() / endInstreamForeignObject()
* endInlineContainer() [... ;-) ...] / endBlockContainer()
* endInline() / endBlock()

In fact, there are a whole lot more. The idea is obviously not to have the PageSequenceLM run over the entire page-sequence multiple times, but to have the next childLM continue where the last one left off. If the area addition is started in yet another thread, I think, it would even become possible to release/GC parts of the FOTree (have the LMs dereference their FO) long before we even reach the first endPageSequence() during parsing.

I'm not entirely sure yet, but I have a vague feeling that Simon's Interleaved_Page_Line_Breaking branch will be quite beneficial in getting this right (or may even be the key to making the whole thing feasible in the first place).

If anyone has specific ideas/thoughts in this area (or questions), I'm definitely interested in anything you see in the code that I may have missed.

The goal is of course, as always: to have FOP format 20 copies of the Encyclopedia Brittannica at the same time without exceeding the 64MB limit... ;-)



Reply via email to