On Oct 18, 2007, at 19:23, Vincent Hennebert wrote:
<snip />
OTOH, the above is semantically equivalent to (I think we had already
established that there should not be a double page-break here)
<fo:block break-before="page">
<fo:block>
<fo:block>
If the LMs would be guaranteed to receive the 'normalized' form, the
break-condition can be tested for internally by the outer LM
itself. No
need to look forward or back... The first descendants wouldn't
even need
to check for breaks anymore.
I think I see your point. Basically you’re proposing a push method
(a LM
notifies its parent LM that it has a break-before) while mine is a
pull
method (a LM asks its children LMs if they have break-before).
Yep, although it would not be the LM but rather the FO that pushes
the break-before upwards to its parent if it is also the first child.
The LMs would largely continue to work as they do now, except that
under a certain set of conditions, they don't need to check the
outside anymore: only take into account the forced break on its own
FO. If there is none, then no need to recursively check for first
descendants having forced breaks.
Currently (sorry if it becomes boring to stress this) the
construction of the layout-tree starts only when the end-of-page-
sequence event occurs. I still see room for changing this in the
future, and so I need to consider the effects on the layout-algorithm
as well: the algorithm will, for instance, no longer be able to rely
on *all* childLMs being available the first time it enters the
loop... The last childLM in an iteration might turn out to be not-the-
last-one-after-all. For many following FONodes, the LMs do not exist
yet at that point. Not in my head, at least. ;-)
You’re
more at the FO tree building stage, I’m more at the layout stage. In
terms of efficiency I think both methods are equivalent as the same
amount of method calls will be performed in either way.
Right, but OTOH... it's more a matter of /when/ (in the process) that
happens.
The push method might be slighty more complicated to implement in
special cases like tables: when an fo:cell notifies its parent
fo:table-body that it has a break-before, the table-body must
figure out
if the cell lies in the first row or not.
Almost everything is /slightly/ more complicated in case of
fo:tables, especially those without explicit fo:table-rows or -
columns. ;-)
Anyway, I remember that when I implemented implicit column-numbers, I
also gave TableBody an instance member to check whether we are adding
cells in the first row or not, so this particular case would be
easily addressed. (Checking... yep, it's still there.)
Come to think of tables, I'd consider 'propagation' in terms of
pushing a forced break on a cell to the first cell in the row.
In the table-layout code, at the point where we have a reference to
the row or the first cell in a row, we would immediately know whether
there is a forced break on a first descendant in any of the following
sibling cells without having to request the corresponding childLMs
and trigger a tree-traversal of who-knows-how-many levels.
Keeping in mind the above mentioned idea of triggering layout sooner,
if we can guarantee that the layoutengine always receives complete
rows, then the table-layout job should become a bit simpler in the
general use-case, while still not adding much complexity in trickier,
more exotic cases, like:
//table-cell/block[position() > [EMAIL PROTECTED]'page']
especially where the cell's column-number corresponds to the highest
column-number.
Triggering layout sooner is the only way we are ever going to get FOP
to accept arbitrarily large tables, without consuming massive amounts
of heap. A 'simple' grid of 5 x 500 cells generates +5000 FONodes
(table-cells must have at least one block each) that stay in memory
until the page-sequence is completely finished. I wonder how many
break-possibilities that generates... :/
A matter of taste, probably, but I think I’d prefer the pull
method: the
LM performs requests to the appropriate children LMs exactly when
and if
needed.
The only thing an LM should initially pull/request from its children,
AFAIU, is a list of elements, given a certain LayoutContext.
When composing its own element list, an LM should ideally be able to
rely on the lists it receives from its children. Then add/delete/
update elements and (un)wrap, depending on context that is unknown or
irrelevant to the child.
That may simplify code as well (and improve its readability) as
some form of pull method is necessary anyway (the
mustKeepWithPrevious/WithNext/Together methods).
Keeps are a different story indeed. Big difference is that keeps have
strengths, and breaks do not.
Consider:
<fo:block id="b1">
...
<fo:block id="b2">
<fo:block id="b3" keep-with-previous.within-page="...">
<fo:block id="b4">
<fo:block id="b5" break-before="page">
This may be interpretation: you cannot specify a 'strength' for a
break. It is either there or not. I take this to mean that a forced
break overrules any keep.
Main advantage to the layoutengine would be that forced breaks are
known as early as possible: the break is either there, on the FO,
when the LM is initialized --propagated upwards from a first child,
maybe seven or eight levels down--, or it is not.
The above can be normalized at parse time, with only a marginal cost,
so that the break is propagated upwards to block b2, and the keep is
suppressed before any LM is even created.
I believe you already mentioned this idea of normalizing/
simplifying the
FO tree in the past. Note that it may exist in parallel as it
addresses
a different general issue. One concern I’d have is to make sure that
a simplification leads to a semantically equivalent result.
That is precisely the purpose of normalization: to remove ambiguities
at a point where it is still relatively simple. Ambiguities that
would otherwise cause a significant amount of checks or tree- or list-
traversals later on to get every possible scenario right. (FWIW: XEP
also normalizes the input FO, but there it happens by means of an
XSLT; IIRC, they normalize tables to always have columns and rows,
for example; implicit column-numbers can also quite easily be
computed/assigned as part of an XSL Transform)
Given the complexity of the spec that might be difficult to
establish. Not sure
also if the overhead is compensated by the gain in the further
processes
(layout, area tree generation). But that’s a different topic.
The key advantage in the longer term is that the start of those
further processes can be triggered sooner, without adding too much
complexity to the related source code.
Agreed with the concerns, but I'm wondering if these portions of
code,
instead of extracting them into a separate class, could be
centralized
in, say, BlockStackingLM and InlineStackingLM...?
I thought of that, but a separate class looked cleaner to me for some
reasons:
- the LMs classes are already overcrowded with many different concerns
True.
- the code would be about the same for Block- and InlineStackingLM
- we could factorize it into a common super-class
AbstractStackingLM...?
I kind of like the idea. For the really shared portions,
AbstractStackingLM could then implement a set of static methods.
but both those classes
have subclasses to which breaks don’t apply (Flow-, StaticContentLM,
for example).
I wouldn't really see this as a problem. The related methods will
never be called, unless there is a flaw in our logic[*]. To stress
the fact that they serve no purpose there, we could add overrides
that always return false.
[*] (They won't be called, precisely because breaks don't apply?)
OTOH keeps apply to AbstractGraphicsLM which doesn’t
inherit any of those classes.
That's a special case, since in principle a graphic does not itself
consist of more layout-objects that need to be stacked. To the
layoutengine, a graphic is simply a monolithic box. Graphics are
inline by definition nonetheless, so it could be InlineStackingLM
with the same reservations as for FlowLM and StaticContentLM, but for
other methods (the actual 'inline-stacking' can be considered to be
delegated to the producer of the graphic, here).
Cheers
Andreas