On 16.01.2008 01:20:36 Andreas L Delmelle wrote: > > Hi people, > > I know I've mentioned it a few times before, so here's another > attempt at brainstorming about possible improvements in interaction > between fotree and layoutengine. > (So, obviously what follows does not apply to the output formats that > bypass the layoutengine, like RTF) > > In short, it concerns two aspects of this interaction that, when > revisited, could lead to a much faster and less memory-hungry FOP. > At least, that's the idea. I have read some on Java performance > tuning, but would hardly call myself an expert on the matter... > > The first would be FOP-internal multi-threading, which is already > possible at the higher-level (and relatively easy to implement, IIC). > Right now, we do roughly, single-threaded: > a) catch SAX events, and build a sort of FOP-DOM (the FOTree) until > we reach the end of a page-sequence > b) perform layout for the entire page-sequence > c) create the area tree, and render it > > After c) we go back to a), while that process could have easily > continued by starting b) in a separate thread, at least for the first > a). > > Hope you can follow... :/ > > As Chris recently mentioned, forcing this /would/ make FOP non- > compliant to the J2EE Bean standard, so this could probably be made > part of the configuration somehow (and, leaping to Adrian's recent > post on fop-users@: this could then be specified in the XSLT). > > At first glance, by itself, this would make FOP actually more memory- > hungry than it currently is.
Not only at first glance. It's one major problem when you start multi-threading (more below). > Imagine a handful of 20-page sequences > being processed concurrently, and consider the number of objects > involved if we're talking about lots of small blocks, inlines, tables > etc. > > So, on top of that, I'm thinking of making b) less of a monolithic > process. > At the moment, we always wait for an endPageSequence() call on the > AreaTreeHandler, which works fine for small to medium-sized page- > sequences, but is definitely not scaleable to larger ones consisting > of a lot of FOs. I think we should take a look at implementing endFlow > (), for instance, or startFlow(). At those points, we are already > guaranteed to have at least the part of the FOTree that is necessary > to perform some basic preliminary layout (the *ahem* Pagination and > Layout FOs). What's "preliminary layout"? I don't get it. startFlow/endFlow doesn't help at all. That only excludes the static content from the page-sequence. One flow could still be huge. > Other handlers that could turn out to be interesting to implement > (but I'm guessing this to be interesting only for the flow, not for > the static-content): > * endExternalGraphic() / endInstreamForeignObject() > * endInlineContainer() [... ;-) ...] / endBlockContainer() > * endInline() / endBlock() > > In fact, there are a whole lot more. The idea is obviously not to > have the PageSequenceLM run over the entire page-sequence multiple > times, but to have the next childLM continue where the last one left > off. If the area addition is started in yet another thread, I think, > it would even become possible to release/GC parts of the FOTree (have > the LMs dereference their FO) long before we even reach the first > endPageSequence() during parsing. The key here would be to have mechanisms to limit memory consumption. If the FO is built up faster than the layout engine can consume it you still haven't gained anything. Smells like a lot of thread synchronization and complexity if you do it the multi-threading way. Even single-threaded, the complexity would grow again because there will be more interaction between the different parts of FOP. Furthermore, you need to know exactly when you can release an FO tree or layout object, i.e. when you're absolutely sure that you won't need it anymore. Currently, the first inline FO in a page-sequence is kept in memory even if the layout engine is already on page 234. > I'm not entirely sure yet, but I have a vague feeling that Simon's > Interleaved_Page_Line_Breaking branch will be quite beneficial in > getting this right (or may even be the key to making the whole thing > feasible in the first place). His work is the precondition for that to be become possible in the first place. I'd like to do this one step at a time. Once we have restartable layout we can think about ways to limit the amount of objects we keep in memory at one point in time. > If anyone has specific ideas/thoughts in this area (or questions), > I'm definitely interested in anything you see in the code that I may > have missed. > > The goal is of course, as always: to have FOP format 20 copies of the > Encyclopedia Brittannica at the same time without exceeding the 64MB > limit... ;-) > > > Cheers > > Andreas I appreciate you starting this discussion but I think it's slightly too soon. Jeremias Maerki
