On Sat, 31 Dec 2005 09:23 pm, Andreas L Delmelle wrote:
> On Dec 31, 2005, at 08:26, Manuel Mall wrote:
> > On Sat, 31 Dec 2005 02:41 am, Andreas L Delmelle wrote:
> >> Point is: if trailing spaces in a line are correctly suppressed
> >> during line-building, the trailing spaces in the last inline of a
> >> given block would be removed in that step (no matter at what depth
> >> the inline is nested).
> > the problem is that the Knuth algorithm doesn't deal with spaces
> > (glue)
> > at the end or beginning of a paragraph. It only discards space
> > (glue) when the algorithm creates a line break.
> Not always: see block_white-space-collapse_2.xml
> The reason why it fails is that the trailing spaces at the end of the
> first line aren't discarded. Specifying text-align="justify" makes
> the algorithm throw away the trailing spaces (maybe "end" or "right"
> too, haven't checked that yet)
These tests fail because the Knuth element sequences for consecutive
whitespace are not correct. A sequence of whitespace currently
generates a Knuth sequence (simplified) of the form:
pen - glue - pen - glue - pen - glue ....
This means every space becomes a valid break point. In the usual ignore
scenario (white-space-treatment="ignore...") this is incorrect as the
only valid break point should be the first space (and all be
discarded). So the sequence should look more like:
pen - glue - glue - glue ....
The correct sequence for white-space-treatment="preserve" is more
interesting, every space becomes something like:
The first penalty is the actual break possibility, the box prevents
discarding of the following glue if the break is chosen, the infinite
penalty prevents the glue from being a break possibility.
In summary the current Knuth sequences are incorrect and just happen to
work in the special case of a single space that is under
white-space-treatment="ignore-if-surrounding-linefeed". Luckily this is
the most common scenario.
> > It is (messy?) FOP custom code outside the core Knuth algorithm
> > which deals with removing glue at the
> > beginning and end of a paragraph. This should IMO therefore dealt
> > with during refinement. I assume (haven't checked) that your
> > whitespace handling does remove all leading whitespace in a
> > paragraph and therefore it would make sense if it also removes all
> > trailing whitespace (nice symmetry :-)).
> Yeah, it would be a very nice symmetry :-)
> Well, it's definitely not impossible, but I'm wondering a bit about
> Cost vs. Benefit. Currently, when the trailing spaces for any inline
> are treated --in Inline.endOfNode()-- one has no way of knowing
> whether any text will still follow --possible subsequent nested
> inlines, text or characters will not be available yet.
This indicates to me that your redesigned algorithm has the same flaws
as we currently encounter with the inline layout manager structure. Any
problems which require looking across FO (= LM) boundaries suddenly
become hard. BTW, the original block level whitespace handling
refinement didn't have that problem as it had the whole block content
to available to it. So I still think we have regressed here.
> In theory, we could keep a reference alive to the last FOText of the
> previous inline, so that when it appears at the end of the block, we
> could strip its trailing white-space too.
Yes, that is what you get when doing this fo centric. You have to keep
context / state / global variables to deal with "cross border" issues.
> OTOH, if the white-space suppression in layout is made to work
> properly in all cases, those trailing spaces should automatically be
> removed since they are trailing in a line (whether it is the last
> line in the paragraph or not shouldn't make any difference).
> So, I held off FTM on trying to remove these spaces during
> refinement, and wanted to see if this problem doesn't get solved by
> tweaking the white-space removal during line-building.
> > Note that the point is that we don't need any special code to
> > discard whitespace around Knuth generated linebreaks as the
> > algorithm does that
> > for us (actually we need special code to prevent discards for
> > certain linefeed-treatment values but that is more of a matter of
> > generating Knuth sequences which allow breaks but don't discard and
> > does not require a change to the algorithms). Therefore the only
> > special case is
> > the beginning and end of a paragraph. As the beginning is handled
> > by whitespace handling at the FO level the end bit should be as
> > well.
> Apart from the aesthetic argument (nice symmetry): why exactly?
> Again, IMO, if the right element-sequences are generated for these
> white-spaces, they should be suppressed at the end of the paragraph
> anyway (forced EOL).
Its not a matter of generating the correct Knuth element sequences
because the algorithm doesn't care about what is at the beginning or
end of a paragraph. Giving the correct (= whitespace handled) paragraph
to the Knuth algorithm is a precondition. Again: line breaking deals
with adding breaks at optimal allowable points within the text it
doesn't care what's at the start and end.
> In the end, it's all the same to me, I guess...