Re: Position, Leaf/NonLeafPosition, wrapping positions

Andreas L Delmelle Fri, 09 Mar 2007 12:46:43 -0800

On Mar 9, 2007, at 18:35, Vincent Hennebert wrote:

<snip />
  [Me:]

-> bind the PropertyList to the FONode

(= transfer the applicable properties for that particular node-type

       to instance members of the FONode; the PropertyList itself is
       only stored by the FOTreeBuilder to use as parent PropertyList
       for the FONode's childrens' PropertyLists)


When you say "transfer the applicable properties", you mean that
inheritance is also handled here? That is, from all the specified +
inherited properties, pick up the ones which apply?

Correct. Roughly each applicable property(-bundle) corresponds to aninstance member (where the 'bundle' refers to the Common*Properties)


Property inheritance is handled entirely by the PropertyList.

The very first step is taken in FONode.processNode(), where theattributes are added to the list. This takes care of the explicitlyspecified properties.In FObj.bind(), which is called right after that, each call toPropertyList.get() triggers the property resolution mechanism, roughly:

try specified value

try getting the implied value from a specified shorthand property ifapplicable

try inheritance if applicable

and if all this fails, fall back to the initial value defined in theRecommendation.

The only exception are descendant nodes of fo:marker, for which allthis happens only during layout when the marker is actuallyretrieved. Here, a different type of PropertyList is used, which onlystores the explicitly specified value as a string, and does notconvert it to a Property yet. (see: Marker.MarkerPropertyList)When the MainFOHandler receives a startElement() event for anfo:marker, the context is switched, so that the descendant nodes knowthe bind() step should be skipped at that point.This is done because the properties of marker-descendants must beresolved as if they had been specified as descendants of the static-content where the corresponding retrieve-marker is defined. Formarker-descendants, "inheritance" actually means "inheritance fromthe ancestry of the retrieve-marker". Therefore, the RetrieveMarkerkeeps its PropertyList alive, so that it can serve as a parentPropertyList for the marker-descendants later on, when cloning thesubtree.

<snip />
Some thoughts related to this:
It would anyway be best to start the layout process as soon aspossible;ideally there would be multiple, chained threads for the multipletasks:
FO tree generation, Knuth elements generation, breaking, area tree
generation, rendering, etc. They would act like Unix pipes, in a
producer/consumer model where each thread would be fed by thethread it
depends on, and would itself feed the subsequent thread.
Questions are: does that make sense, when does a thread know it can
start its work, can we clearly separate the several processes, oh well
all those thread synchronizing issues, etc.
But that might give some real performance boost on multi-processormachines.

That's roughly the idea a few of us are dreaming of implementing, Ithink... if they had the time. :)If you compare the codebase to FOP 0.20.5, you'll notice that a largepart of the redesign precisely consisted of separating the differentprocesses, mainly FO tree generation and layout. In FOP 0.20.5, thereis very much layout-related code scattered in the FObjs.

OTOH, before starting to use separate threads, I think some otherrefactoring can/needs to be done first, if only to make it easierafterwards to implement the threads. I was already playing with theidea to move the LM-construction and some of the initialization to aseparate LMInitThread, but then bumped into the mentioned problem.The list of child-nodes cannot be extended after the LM has beenconstructed. The fo:page-sequence, or at least one of its fo:flowsmust be completely parsed before instantiating a FlowLayoutManager.[An attempt to work around this by dropping the ArrayList entirely,and using an iterator over a virtual list without this particularlimitation can be found in Bugzilla 41656. Patch not applied yet,since I haven't done any extensive testing of its effects. I onlyknow that the junit tests pass.]

The idea in the long run was to move towards a separation of concernsinside the layoutengine, which could then later on make room forsomething like a ListProducerThread, and a BreakerThread ?

In the worst case, at first without multi-threading, FOP would usethe exact same amount of heap as it does now, only distributed a bitdifferently. More of it sooner than it does now, but in the end notexceeding the current state, where we ultimately end up with a wholeFO tree for a page-sequence on the one hand /and/ a correspondingtree of LayoutManagers. Not to mention all the Lists and ListElementsthat are generated in between...If you don't give the breaking-algorithm a chance to reset from timeto time, memory consumption shoots through the roof, as the number ofbreak-possibilities grows.See also my recent investigation of an OOMError when a huge block of#PCDATA is wrapped inside a single fo:block. A simple switch tolinefeed-treatment="preserve", which generates forced breaks insidethe text, made FOP pass the test without needing an absurd amount ofheap space. Without preserved linefeeds, I needed a minimum of 768MBof heap to not run out of memory. For an output of around 35 pages...I'm guessing that roughly the same thing applies to the page-breakingalgorithm when the size of the page-sequence increases, consists oflots and lots of smaller elements and there are no explicit breaks:the more content, the more possible pages, the more break-possibilities that need to be remembered and compared, the more timespent on computing a single one of them...

Maybe this is inherent to the algorithm, I don't know, but I alsocannot rule out that it is (partly) a consequence of its implementation.

In the longer term, we could start thinking about clearing thereferences to the FO nodes earlier. If a LayoutManager has no moreneed for its FO, release it (and clear it in the FO tree as well). Asto how early this becomes possible, I'm not completely sure yet...Once, when I noticed that a TextLM initializes itself with theproperties of its associated FOText and creates a copy of its chararray, I tried adding "FOText.ca = null;", and encountered aNullPointerException further on, which indicated that the originalchar array was in fact still needed at some point. Didn't investigateit further at the time, though. Maybe that reference could bereplaced by one to the copy that resides in TextLM...

<snip />
Regarding the changing-IPD problem, I wrote some notes during the GSoC
last summer:
http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006/FloatsImplementationProgress/ImplementingSideFloats#head-953fc5836ed422f91834ea15bf1e2515d0101300
I already explained my ideas to some of you. At one time I'll have to
write them down on a wiki page.

Indeed, I remember the discussion. IIRC, you mentioned one veryinteresting idea about combining the line- and page-breaking loops,which would be a huge step in the right direction. If my estimationsare correct, it would solve the above mentioned scalability problemof the algorithm if it is no longer forced to take into account *all*preceding break-possibilities but only a certain amount of bufferedpossibilities...


<snip />

That seems to give some confirmation to my thread ideas above: athread

for creating FONodes, one for LMs, one for layout; change from a pull
model to a push model: instead of requiring the next LM, the LM thread
would notify the layout thread that a new LM is available. Possibly
while being itself notified by the FONode thread that new nodes have
been created.

Moving away from the pull model would be a good idea, but we don'timmediately need separate threads for that, IIC. I was thinking inthe direction of using AreaTreeHandler.startPageSequence() toinitialize the PageSequenceLM. The layout-master-set is availableanyway at that point.As such, endBlock() could then be used as a sort of marker event tosignal to the PageSequenceLM that it can begin or resume its work...Big benefit, if combined with your idea of merging the line- and page-breaking loop, would be that the next page's ipd will be known beforethe block's line-layout starts.

<snip />
The area tree that is the result of all this, is then handed offto therenderer, which basically translates the area tree structure intoPDF,
PS, XML...
... but that's another story ;-)


Yep, and there are other people more suitable than me to tell it. :)

I have not yet explored the renderer code to the furthest possibleextent, so there's only little insight I can offer there.

That's it --for now :)
Those notes deserve their wiki page, to not get lost in the mailinglistarchives. I'll create one as soon as I have time. The documentationpart
of the website might also need some cleaning up, BTW.

Simon also once published a description of 'FOP at work' on hishomepage, following and explaining the call stack of FOP whileprocessing a document. This was, however, quite a while ago, datingback from before the implementation of the Knuth algorithm. Maybe itis still available and parts of it can be used to add to such a Wikior to the development documentation...



Cheers,

Andreas

Re: Position, Leaf/NonLeafPosition, wrapping positions

Reply via email to