On Dec 31, 2005, at 16:05, Manuel Mall wrote:

[Me:]
Well, it's definitely not impossible, but I'm wondering a bit about
Cost vs. Benefit. Currently, when the trailing spaces for any inline
are treated --in Inline.endOfNode()-- one has no way of knowing
whether any text will still follow --possible subsequent nested
inlines, text or characters will not be available yet.


This indicates to me that your redesigned algorithm has the same flaws
as we currently encounter with the inline layout manager structure. Any
problems which require looking across FO (= LM) boundaries suddenly
become hard. BTW, the original block level whitespace handling
refinement didn't have that problem as it had the whole block content
to available to it. So I still think we have regressed here.

Maybe so... but I'm looking at this as taking a step backwards like one does before taking a leap.

Besides that, it is not a *flaw* per se. Strictly speaking, white- space collapsing/removal applies to sibling character nodes in the source document. The fact that leading white-space in a paragraph can be removed during refinement without any real extra effort is a convenience, a bonus that follows from the preceding text-nodes or inline-nodes already being processed (= the state indicated by the 'inWhiteSpace' and 'afterLinefeed' variables can be carried over). There is no need for look-behind here (the previous algorithm didn't do so either).

The possible problem I saw with the block-level white-space handling was that all white-space characters would continue to take up memory until the first nested block or in the worst case, until the end-of- block. In case of large blocks with lots of indents due to pretty- printing, the current approach makes these spaces disappear much sooner (= more memory-efficient).

When I talk about cost/benefit, I refer to the fact that we already get two passes over the same character sequences:
- once when building the FOTree
- another when performing layout

In order to implement this trailing white-space removal for nested trailing inlines during refinement --I can't stress it enough: a *purely* aesthetical matter; the conceptual/logical necessity still escapes me...-- we would have to add a third pass.

In theory, we could keep a reference alive to the last FOText of the
previous inline, so that when it appears at the end of the block, we
could strip its trailing white-space too.

Yes, that is what you get when doing this fo centric. You have to keep
context / state / global variables to deal with "cross border" issues.

Carrying over the context is no problem when it comes to previous nodes, but you simply don't have the luxury of look-ahead in the FOTree --that is, look-ahead is limited to the nodes already availiable at that point. One way to deal with it is to accumulate all nodes, and only process them at the end-of-block/nested blocks. This has the above mentioned drawback --space characters taking up resources far longer than strictly necessary.

OTOH, look-ahead in the FOTree isn't really required for anything (apart from maybe this particular scenario). The layout algorithm *needs* to be able to move/look in both directions anyway, so AFAICT, it shouldn't be too much effort to handle trailing spaces for trailing nested inlines there... If that is such a difficult matter, then one should doubt the layout- algorithm, if anything, instead of trying to work around the lack of look-ahead in the FOTree.

[Me:]
Apart from the aesthetic argument (nice symmetry): why exactly?
Again, IMO, if the right element-sequences are generated for these
white-spaces, they should be suppressed at the end of the paragraph
anyway (forced EOL).


Its not a matter of generating the correct Knuth element sequences
because the algorithm doesn't care about what is at the beginning or
end of a paragraph. Giving the correct (= whitespace handled) paragraph
to the Knuth algorithm is a precondition. Again: line breaking deals
with adding breaks at optimal allowable points within the text it
doesn't care what's at the start and end.

Et voilĂ , that seems to be where the real *flaw* is located, if you ask me. It should care about glues at the beginning of a line --which it seems to handle perfectly ATM-- regardless of whether it's the first line in a paragraph or not. In the same way, it should care about glues at the end of a line, regardless of whether it is the last line in a paragraph or not.

Besides that, I get the impression you're somewhat contradicting yourself here: - in the comment on the failing testcase you noted that 'These tests fail because the Knuth element sequences for consecutive whitespace are not correct.' - and now you're saying that it's not a matter of generating the correct element sequences

Can you clarify? Doesn't this indicate that there is a difference in processing between the last line in a paragraph and all other lines... which seems inconsistent. A line is a line is a line, no matter at what position in the paragraph we find ourselves.


Cheers,

Andreas

Reply via email to