On Mar 9, 2007, at 18:35, Vincent Hennebert wrote:

<snip />
-> bind the PropertyList to the FONode
(= transfer the applicable properties for that particular node- type
       to instance members of the FONode; the PropertyList itself is
       only stored by the FOTreeBuilder to use as parent PropertyList
       for the FONode's childrens' PropertyLists)

When you say "transfer the applicable properties", you mean that
inheritance is also handled here? That is, from all the specified +
inherited properties, pick up the ones which apply?

Correct. Roughly each applicable property(-bundle) corresponds to an instance member (where the 'bundle' refers to the Common*Properties)

Property inheritance is handled entirely by the PropertyList.
The very first step is taken in FONode.processNode(), where the attributes are added to the list. This takes care of the explicitly specified properties. In FObj.bind(), which is called right after that, each call to PropertyList.get() triggers the property resolution mechanism, roughly:
try specified value
try getting the implied value from a specified shorthand property if applicable
try inheritance if applicable
and if all this fails, fall back to the initial value defined in the Recommendation.

The only exception are descendant nodes of fo:marker, for which all this happens only during layout when the marker is actually retrieved. Here, a different type of PropertyList is used, which only stores the explicitly specified value as a string, and does not convert it to a Property yet. (see: Marker.MarkerPropertyList) When the MainFOHandler receives a startElement() event for an fo:marker, the context is switched, so that the descendant nodes know the bind() step should be skipped at that point. This is done because the properties of marker-descendants must be resolved as if they had been specified as descendants of the static- content where the corresponding retrieve-marker is defined. For marker-descendants, "inheritance" actually means "inheritance from the ancestry of the retrieve-marker". Therefore, the RetrieveMarker keeps its PropertyList alive, so that it can serve as a parent PropertyList for the marker-descendants later on, when cloning the subtree.

<snip />
Some thoughts related to this:
It would anyway be best to start the layout process as soon as possible; ideally there would be multiple, chained threads for the multiple tasks:
FO tree generation, Knuth elements generation, breaking, area tree
generation, rendering, etc. They would act like Unix pipes, in a
producer/consumer model where each thread would be fed by the thread it
depends on, and would itself feed the subsequent thread.
Questions are: does that make sense, when does a thread know it can
start its work, can we clearly separate the several processes, oh well
all those thread synchronizing issues, etc.
But that might give some real performance boost on multi-processor machines.

That's roughly the idea a few of us are dreaming of implementing, I think... if they had the time. :) If you compare the codebase to FOP 0.20.5, you'll notice that a large part of the redesign precisely consisted of separating the different processes, mainly FO tree generation and layout. In FOP 0.20.5, there is very much layout-related code scattered in the FObjs.

OTOH, before starting to use separate threads, I think some other refactoring can/needs to be done first, if only to make it easier afterwards to implement the threads. I was already playing with the idea to move the LM-construction and some of the initialization to a separate LMInitThread, but then bumped into the mentioned problem. The list of child-nodes cannot be extended after the LM has been constructed. The fo:page-sequence, or at least one of its fo:flows must be completely parsed before instantiating a FlowLayoutManager. [An attempt to work around this by dropping the ArrayList entirely, and using an iterator over a virtual list without this particular limitation can be found in Bugzilla 41656. Patch not applied yet, since I haven't done any extensive testing of its effects. I only know that the junit tests pass.]

The idea in the long run was to move towards a separation of concerns inside the layoutengine, which could then later on make room for something like a ListProducerThread, and a BreakerThread ?

In the worst case, at first without multi-threading, FOP would use the exact same amount of heap as it does now, only distributed a bit differently. More of it sooner than it does now, but in the end not exceeding the current state, where we ultimately end up with a whole FO tree for a page-sequence on the one hand /and/ a corresponding tree of LayoutManagers. Not to mention all the Lists and ListElements that are generated in between... If you don't give the breaking-algorithm a chance to reset from time to time, memory consumption shoots through the roof, as the number of break-possibilities grows. See also my recent investigation of an OOMError when a huge block of #PCDATA is wrapped inside a single fo:block. A simple switch to linefeed-treatment="preserve", which generates forced breaks inside the text, made FOP pass the test without needing an absurd amount of heap space. Without preserved linefeeds, I needed a minimum of 768MB of heap to not run out of memory. For an output of around 35 pages... I'm guessing that roughly the same thing applies to the page-breaking algorithm when the size of the page-sequence increases, consists of lots and lots of smaller elements and there are no explicit breaks: the more content, the more possible pages, the more break- possibilities that need to be remembered and compared, the more time spent on computing a single one of them...

Maybe this is inherent to the algorithm, I don't know, but I also cannot rule out that it is (partly) a consequence of its implementation.

In the longer term, we could start thinking about clearing the references to the FO nodes earlier. If a LayoutManager has no more need for its FO, release it (and clear it in the FO tree as well). As to how early this becomes possible, I'm not completely sure yet... Once, when I noticed that a TextLM initializes itself with the properties of its associated FOText and creates a copy of its char array, I tried adding "FOText.ca = null;", and encountered a NullPointerException further on, which indicated that the original char array was in fact still needed at some point. Didn't investigate it further at the time, though. Maybe that reference could be replaced by one to the copy that resides in TextLM...

<snip />
Regarding the changing-IPD problem, I wrote some notes during the GSoC
last summer:
http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006/ FloatsImplementationProgress/ ImplementingSideFloats#head-953fc5836ed422f91834ea15bf1e2515d0101300
I already explained my ideas to some of you. At one time I'll have to
write them down on a wiki page.

Indeed, I remember the discussion. IIRC, you mentioned one very interesting idea about combining the line- and page-breaking loops, which would be a huge step in the right direction. If my estimations are correct, it would solve the above mentioned scalability problem of the algorithm if it is no longer forced to take into account *all* preceding break-possibilities but only a certain amount of buffered possibilities...

<snip />
That seems to give some confirmation to my thread ideas above: a thread
for creating FONodes, one for LMs, one for layout; change from a pull
model to a push model: instead of requiring the next LM, the LM thread
would notify the layout thread that a new LM is available. Possibly
while being itself notified by the FONode thread that new nodes have
been created.

Moving away from the pull model would be a good idea, but we don't immediately need separate threads for that, IIC. I was thinking in the direction of using AreaTreeHandler.startPageSequence() to initialize the PageSequenceLM. The layout-master-set is available anyway at that point. As such, endBlock() could then be used as a sort of marker event to signal to the PageSequenceLM that it can begin or resume its work... Big benefit, if combined with your idea of merging the line- and page- breaking loop, would be that the next page's ipd will be known before the block's line-layout starts.

<snip />
The area tree that is the result of all this, is then handed off to the renderer, which basically translates the area tree structure into PDF,
PS, XML...

... but that's another story ;-)

Yep, and there are other people more suitable than me to tell it. :)

I have not yet explored the renderer code to the furthest possible extent, so there's only little insight I can offer there.

That's it --for now :)

Those notes deserve their wiki page, to not get lost in the mailing list archives. I'll create one as soon as I have time. The documentation part
of the website might also need some cleaning up, BTW.

Simon also once published a description of 'FOP at work' on his homepage, following and explaining the call stack of FOP while processing a document. This was, however, quite a while ago, dating back from before the implementation of the Knuth algorithm. Maybe it is still available and parts of it can be used to add to such a Wiki or to the development documentation...



Reply via email to