On Mar 9, 2007, at 18:35, Vincent Hennebert wrote:
<snip />
[Me:]
-> bind the PropertyList to the FONode
(= transfer the applicable properties for that particular node-
type
to instance members of the FONode; the PropertyList itself is
only stored by the FOTreeBuilder to use as parent PropertyList
for the FONode's childrens' PropertyLists)
When you say "transfer the applicable properties", you mean that
inheritance is also handled here? That is, from all the specified +
inherited properties, pick up the ones which apply?
Correct. Roughly each applicable property(-bundle) corresponds to an
instance member (where the 'bundle' refers to the Common*Properties)
Property inheritance is handled entirely by the PropertyList.
The very first step is taken in FONode.processNode(), where the
attributes are added to the list. This takes care of the explicitly
specified properties.
In FObj.bind(), which is called right after that, each call to
PropertyList.get() triggers the property resolution mechanism, roughly:
try specified value
try getting the implied value from a specified shorthand property if
applicable
try inheritance if applicable
and if all this fails, fall back to the initial value defined in the
Recommendation.
The only exception are descendant nodes of fo:marker, for which all
this happens only during layout when the marker is actually
retrieved. Here, a different type of PropertyList is used, which only
stores the explicitly specified value as a string, and does not
convert it to a Property yet. (see: Marker.MarkerPropertyList)
When the MainFOHandler receives a startElement() event for an
fo:marker, the context is switched, so that the descendant nodes know
the bind() step should be skipped at that point.
This is done because the properties of marker-descendants must be
resolved as if they had been specified as descendants of the static-
content where the corresponding retrieve-marker is defined. For
marker-descendants, "inheritance" actually means "inheritance from
the ancestry of the retrieve-marker". Therefore, the RetrieveMarker
keeps its PropertyList alive, so that it can serve as a parent
PropertyList for the marker-descendants later on, when cloning the
subtree.
<snip />
Some thoughts related to this:
It would anyway be best to start the layout process as soon as
possible;
ideally there would be multiple, chained threads for the multiple
tasks:
FO tree generation, Knuth elements generation, breaking, area tree
generation, rendering, etc. They would act like Unix pipes, in a
producer/consumer model where each thread would be fed by the
thread it
depends on, and would itself feed the subsequent thread.
Questions are: does that make sense, when does a thread know it can
start its work, can we clearly separate the several processes, oh well
all those thread synchronizing issues, etc.
But that might give some real performance boost on multi-processor
machines.
That's roughly the idea a few of us are dreaming of implementing, I
think... if they had the time. :)
If you compare the codebase to FOP 0.20.5, you'll notice that a large
part of the redesign precisely consisted of separating the different
processes, mainly FO tree generation and layout. In FOP 0.20.5, there
is very much layout-related code scattered in the FObjs.
OTOH, before starting to use separate threads, I think some other
refactoring can/needs to be done first, if only to make it easier
afterwards to implement the threads. I was already playing with the
idea to move the LM-construction and some of the initialization to a
separate LMInitThread, but then bumped into the mentioned problem.
The list of child-nodes cannot be extended after the LM has been
constructed. The fo:page-sequence, or at least one of its fo:flows
must be completely parsed before instantiating a FlowLayoutManager.
[An attempt to work around this by dropping the ArrayList entirely,
and using an iterator over a virtual list without this particular
limitation can be found in Bugzilla 41656. Patch not applied yet,
since I haven't done any extensive testing of its effects. I only
know that the junit tests pass.]
The idea in the long run was to move towards a separation of concerns
inside the layoutengine, which could then later on make room for
something like a ListProducerThread, and a BreakerThread ?
In the worst case, at first without multi-threading, FOP would use
the exact same amount of heap as it does now, only distributed a bit
differently. More of it sooner than it does now, but in the end not
exceeding the current state, where we ultimately end up with a whole
FO tree for a page-sequence on the one hand /and/ a corresponding
tree of LayoutManagers. Not to mention all the Lists and ListElements
that are generated in between...
If you don't give the breaking-algorithm a chance to reset from time
to time, memory consumption shoots through the roof, as the number of
break-possibilities grows.
See also my recent investigation of an OOMError when a huge block of
#PCDATA is wrapped inside a single fo:block. A simple switch to
linefeed-treatment="preserve", which generates forced breaks inside
the text, made FOP pass the test without needing an absurd amount of
heap space. Without preserved linefeeds, I needed a minimum of 768MB
of heap to not run out of memory. For an output of around 35 pages...
I'm guessing that roughly the same thing applies to the page-breaking
algorithm when the size of the page-sequence increases, consists of
lots and lots of smaller elements and there are no explicit breaks:
the more content, the more possible pages, the more break-
possibilities that need to be remembered and compared, the more time
spent on computing a single one of them...
Maybe this is inherent to the algorithm, I don't know, but I also
cannot rule out that it is (partly) a consequence of its implementation.
In the longer term, we could start thinking about clearing the
references to the FO nodes earlier. If a LayoutManager has no more
need for its FO, release it (and clear it in the FO tree as well). As
to how early this becomes possible, I'm not completely sure yet...
Once, when I noticed that a TextLM initializes itself with the
properties of its associated FOText and creates a copy of its char
array, I tried adding "FOText.ca = null;", and encountered a
NullPointerException further on, which indicated that the original
char array was in fact still needed at some point. Didn't investigate
it further at the time, though. Maybe that reference could be
replaced by one to the copy that resides in TextLM...
<snip />
Regarding the changing-IPD problem, I wrote some notes during the GSoC
last summer:
http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2006/
FloatsImplementationProgress/
ImplementingSideFloats#head-953fc5836ed422f91834ea15bf1e2515d0101300
I already explained my ideas to some of you. At one time I'll have to
write them down on a wiki page.
Indeed, I remember the discussion. IIRC, you mentioned one very
interesting idea about combining the line- and page-breaking loops,
which would be a huge step in the right direction. If my estimations
are correct, it would solve the above mentioned scalability problem
of the algorithm if it is no longer forced to take into account *all*
preceding break-possibilities but only a certain amount of buffered
possibilities...
<snip />
That seems to give some confirmation to my thread ideas above: a
thread
for creating FONodes, one for LMs, one for layout; change from a pull
model to a push model: instead of requiring the next LM, the LM thread
would notify the layout thread that a new LM is available. Possibly
while being itself notified by the FONode thread that new nodes have
been created.
Moving away from the pull model would be a good idea, but we don't
immediately need separate threads for that, IIC. I was thinking in
the direction of using AreaTreeHandler.startPageSequence() to
initialize the PageSequenceLM. The layout-master-set is available
anyway at that point.
As such, endBlock() could then be used as a sort of marker event to
signal to the PageSequenceLM that it can begin or resume its work...
Big benefit, if combined with your idea of merging the line- and page-
breaking loop, would be that the next page's ipd will be known before
the block's line-layout starts.
<snip />
The area tree that is the result of all this, is then handed off
to the
renderer, which basically translates the area tree structure into
PDF,
PS, XML...
... but that's another story ;-)
Yep, and there are other people more suitable than me to tell it. :)
I have not yet explored the renderer code to the furthest possible
extent, so there's only little insight I can offer there.
That's it --for now :)
Those notes deserve their wiki page, to not get lost in the mailing
list
archives. I'll create one as soon as I have time. The documentation
part
of the website might also need some cleaning up, BTW.
Simon also once published a description of 'FOP at work' on his
homepage, following and explaining the call stack of FOP while
processing a document. This was, however, quite a while ago, dating
back from before the implementation of the Knuth algorithm. Maybe it
is still available and parts of it can be used to add to such a Wiki
or to the development documentation...
Cheers,
Andreas