Hi all,
(Manuel, I guess this is mostly directed to you, as you may already
have been browsing the same classes...)
Just wandering a bit through the FOText source code (follow-up on
Manuel's recent thread on whitespace handling), and I stumbled upon
the following suspicious little detail:
FOText has a static member 'lastFOTextProcessed', which doesn't seem
to get cleared/flushed anywhere.
The intention is quite clear, but the possible effects of the current
implementation may turn out rather nasty. IIC, this is what the
warning is about in the FOText javadoc as well as the TODO for that
member variable.
Rough guess: since the variable doesn't get cleared, it always
contains a reference to a char array containing the last portion of
accumulated text (or, more precisely, a FOText instance carrying that
reference, as well as one to the previous FOText etc.) --even after
the document has finished, into the next run if within the same JVM
(+ possible multi-thread mayhem?)
The TODO hints at a solution involving the page-sequence. I somehow
feel that moving it to the block level would be enough... Logically,
whitespace handling --which is one of the prime reasons of existence
of this static variable-- deals with line-breaks, and start-block/end-
block are implicit after- or before-eol.
To follow up on that last sentence, the current refinement whitespace
handling works roughly as follows:
1. Add all text and inline children to the block, until the first non-
inline child is encountered (or the block ends)
2. Recursively iterate over *all* text nodes anywhere in the block up
to here, converting/removing any superfluous whitespace in the process
and (+/-) repeat the above for each uninterrupted sequence of text/
inline children in the block.
Seems to work nicely, for the most part.
Manuel already raised the issue of inappropriate inter-FO whitespace-
collapsing, but I have another question. Given this algorithm, and
knowing that the inlines do not do any whitespace-handling
themselves, what happens in the following case:
<fo:block>
<fo:inline>
<fo:block>
<fo:inline>
<fo:block>
...
?
My current best guess is that the inner block's underlying character
sequence will be 'recursively' iterated over three times (?) That
would be two too many, since all whitespace will have been collapsed
the first time around.
I'm still chewing on some ideas to move part of this to InlineLevel,
so that ultimately, we can do away with the recursion and let each
level handle its own small part. The higher level then chains these
small parts together with its own character content.
One way to make this happen would be to overload
Block.handleWhiteSpace() to deal with an InlineLevel parameter. This
has the advantage of the whitespace-related properties being easily
available. The call to this overloaded method would be made from
InlineLevel.endOfNode().
If you're still following, I'd use a CharIterator that iterates over
regular characters, fo:characters (and possibly the first and last
characters of any nested FO). This iterator can operate very easily
on both inlines and blocks. I don't immediately see any need to
iterate backwards, at least not during refinement. Big advantage here
would precisely be that we can wait until Block.endOfNode() to deal
with any white-space for the entire block (leading and trailing), the
nested bits will already have performed their parts at that point, so
it is done sooner and far more efficiently IIC (guaranteed only one
pass per level, no matter how deep the nesting goes).
Food for thought :-)
Cheers,
Andreas