Re: Thoughts on interaction between FOTree and layoutengine

Andreas L Delmelle Wed, 16 Jan 2008 13:32:52 -0800

On Jan 16, 2008, at 08:38, Jeremias Maerki wrote:

Hi Jeremias

On 16.01.2008 01:20:36 Andreas L Delmelle wrote:

<snip />

At the moment, we always wait for an endPageSequence() call on the
AreaTreeHandler, which works fine for small to medium-sized page-
sequences, but is definitely not scaleable to larger ones consisting

of a lot of FOs. I think we should take a look at implementingendFlow

(), for instance, or startFlow(). At those points, we are already
guaranteed to have at least the part of the FOTree that is necessary
to perform some basic preliminary layout (the *ahem* Pagination and
Layout FOs).


What's "preliminary layout"? I don't get it.


Sorry, my bad. I think I meant "layout preparation".

As in: the fo:layout-master-set is completely available, so a certain(minimal) amount of empty pages could already be prepared here,without knowing anything about the fo:flow or its descendants.startPageSequence() would also be an option. The idea is toinitialize the PageSequenceLM as early as possible, where right now,it does not even exist until the endPageSequence() event occurs. Theonly benefit of waiting this long, is that we have a guarantee thatno layout-work will be performed unless we are 100% certain that theproduced FO is valid. The downsides for the larger and more complexdocuments, however, seem to outweigh this one benefit, especially ifyou take into account that a later page-sequence may still cause thedocument to fail...

For the record: this is a general problem that I have alreadyencountered in a lot of XML applications. Enormously large XML fileslead to trouble, since the applications are DOM-based behind thescenes. The only reason FOP is able to handle fo-files that cannoteven be opened in a lot of XML-editors, simply due to memory-limitations, is precisely that it avoids creating a DOM for theentire FO. We already use a nice combination of both approaches, butit still offers room for expansion.


startFlow/endFlow doesn't help at all. That only excludes the static
content from the page-sequence. One flow could still be huge.

Indeed, I agree that this wouldn't help, if it's only restricted tothose two.OTOH, startFlow/endFlow are called for static-contents too (seefo.pagination.StaticContent#startOfNode()/endOfNode()).

Other handlers that could turn out to be interesting to implement
(but I'm guessing this to be interesting only for the flow, not for
the static-content):
* endExternalGraphic() / endInstreamForeignObject()
* endInlineContainer() [... ;-) ...] / endBlockContainer()
* endInline() / endBlock()

In fact, there are a whole lot more. The idea is obviously not to
have the PageSequenceLM run over the entire page-sequence multiple
times, but to have the next childLM continue where the last one left
off. If the area addition is started in yet another thread, I think,
it would even become possible to release/GC parts of the FOTree (have
the LMs dereference their FO) long before we even reach the first
endPageSequence() during parsing.

The key here would be to have mechanisms to limit memoryconsumption. If

the FO is built up faster than the layout engine can consume it you
still haven't gained anything.


Very good point!

Smells like a lot of thread
synchronization and complexity if you do it the multi-threading way.
Even single-threaded, the complexity would grow again because therewill
be more interaction between the different parts of FOP.

The complexity can be kept at the strict minimum by limiting thenumber of threads to an amount you can count on one hand (5 max.),which would incidentally also place a limit on memory-consumption.At least, it would prevent OOMErrors due to 20 page-sequences beingprocessed at the same time, but we could still run out of memory dueto 2 or 3 page-sequences of 100 pages each.Besides that, a larger number of threads would be worse forperformance *and* would make the whole thing too difficult to debugand maintain. (remember the initial PropertyCache I committed, withthe way-too-many CleanerThreads... a headache to debug, and aperformance bottleneck: this should obviously be avoided :/)

Furthermore, you need to know exactly when you can release an FO tree
or layout object, i.e. when you're absolutely sure that you won't need

it anymore. Currently, the first inline FO in a page-sequence iskept in

memory even if the layout engine is already on page 234.

Yep, I know. At one time, I tried (very simply) to clear FOText.ca inTextLayoutManager, since TextLM duplicates the array uponinitialization.If I remember correctly, when the areas are added, the originalFOText.ca is still referenced, so I ended up with aNullPointerException...

Suddenly I'm thinking we'd also need to take care that we don'tenforce this, since there are definitely use-cases (a 'live' FOeditor), where it becomes necessary/desirable to maintain the entireFOTree at all times (or at least a link between the original FO andthe generated Area). For those cases, instead of releasing theobjects, we could consider serialization/deserialization. To disk, oreven, as I seem to remember being suggested in a Bugzilla report(1063), by using a dedicated database engine (think: optionaldependency on Apache Derby, and some relatively straightforward JDBCcode).The latter could turn out to be an important feature, since dedicatedapplication servers on which FOP runs usually appreciate any sparebyte of disk space and as little unnecessary stress on disk I/O aspossible so it can be reserved for heap-swapping.

I'm not entirely sure yet, but I have a vague feeling that Simon's
Interleaved_Page_Line_Breaking branch will be quite beneficial in
getting this right (or may even be the key to making the whole thing
feasible in the first place).

His work is the precondition for that to be become possible in thefirst

place.


I agree, especially after just reading Simon's response.

I appreciate you starting this discussion but I think it's slightlytoo
soon.


Thought so too, but then I started dreaming again... :-)

Anyway, thanks for the feedback!


Cheers

Andreas

Re: Thoughts on interaction between FOTree and layoutengine

Reply via email to