On Dec 31, 2005, at 16:05, Manuel Mall wrote:
[Me:]
Well, it's definitely not impossible, but I'm wondering a bit about
Cost vs. Benefit. Currently, when the trailing spaces for any inline
are treated --in Inline.endOfNode()-- one has no way of knowing
whether any text will still follow --possible subsequent nested
inlines, text or characters will not be available yet.
This indicates to me that your redesigned algorithm has the same flaws
as we currently encounter with the inline layout manager structure.
Any
problems which require looking across FO (= LM) boundaries suddenly
become hard. BTW, the original block level whitespace handling
refinement didn't have that problem as it had the whole block content
to available to it. So I still think we have regressed here.
Maybe so... but I'm looking at this as taking a step backwards like
one does before taking a leap.
Besides that, it is not a *flaw* per se. Strictly speaking, white-
space collapsing/removal applies to sibling character nodes in the
source document. The fact that leading white-space in a paragraph can
be removed during refinement without any real extra effort is a
convenience, a bonus that follows from the preceding text-nodes or
inline-nodes already being processed (= the state indicated by the
'inWhiteSpace' and 'afterLinefeed' variables can be carried over).
There is no need for look-behind here (the previous algorithm didn't
do so either).
The possible problem I saw with the block-level white-space handling
was that all white-space characters would continue to take up memory
until the first nested block or in the worst case, until the end-of-
block. In case of large blocks with lots of indents due to pretty-
printing, the current approach makes these spaces disappear much
sooner (= more memory-efficient).
When I talk about cost/benefit, I refer to the fact that we already
get two passes over the same character sequences:
- once when building the FOTree
- another when performing layout
In order to implement this trailing white-space removal for nested
trailing inlines during refinement --I can't stress it enough: a
*purely* aesthetical matter; the conceptual/logical necessity still
escapes me...-- we would have to add a third pass.
In theory, we could keep a reference alive to the last FOText of the
previous inline, so that when it appears at the end of the block, we
could strip its trailing white-space too.
Yes, that is what you get when doing this fo centric. You have to keep
context / state / global variables to deal with "cross border" issues.
Carrying over the context is no problem when it comes to previous
nodes, but you simply don't have the luxury of look-ahead in the
FOTree --that is, look-ahead is limited to the nodes already
availiable at that point. One way to deal with it is to accumulate
all nodes, and only process them at the end-of-block/nested blocks.
This has the above mentioned drawback --space characters taking up
resources far longer than strictly necessary.
OTOH, look-ahead in the FOTree isn't really required for anything
(apart from maybe this particular scenario).
The layout algorithm *needs* to be able to move/look in both
directions anyway, so AFAICT, it shouldn't be too much effort to
handle trailing spaces for trailing nested inlines there... If that
is such a difficult matter, then one should doubt the layout-
algorithm, if anything, instead of trying to work around the lack of
look-ahead in the FOTree.
[Me:]
Apart from the aesthetic argument (nice symmetry): why exactly?
Again, IMO, if the right element-sequences are generated for these
white-spaces, they should be suppressed at the end of the paragraph
anyway (forced EOL).
Its not a matter of generating the correct Knuth element sequences
because the algorithm doesn't care about what is at the beginning or
end of a paragraph. Giving the correct (= whitespace handled)
paragraph
to the Knuth algorithm is a precondition. Again: line breaking deals
with adding breaks at optimal allowable points within the text it
doesn't care what's at the start and end.
Et voilĂ , that seems to be where the real *flaw* is located, if you
ask me. It should care about glues at the beginning of a line --which
it seems to handle perfectly ATM-- regardless of whether it's the
first line in a paragraph or not. In the same way, it should care
about glues at the end of a line, regardless of whether it is the
last line in a paragraph or not.
Besides that, I get the impression you're somewhat contradicting
yourself here:
- in the comment on the failing testcase you noted that 'These tests
fail because the Knuth element sequences for consecutive whitespace
are not correct.'
- and now you're saying that it's not a matter of generating the
correct element sequences
Can you clarify? Doesn't this indicate that there is a difference in
processing between the last line in a paragraph and all other
lines... which seems inconsistent. A line is a line is a line, no
matter at what position in the paragraph we find ourselves.
Cheers,
Andreas