On Oct 25, 2005, at 10:57, Manuel Mall wrote:
When FOP is collapsing (b) or removing (c) white space are there any
fences we need to observe. For example a border/padding between two
spaces, e.g. (spaces represented by a .):
<fo:block>...<fo:inline
border="...">...Text ...</fo:inline>...</fo:block>
There are 4 sequences of 3 spaces each. What would we expect the final
outcome to be (assuming it fits on one line):
a) all removed: [border]Text[border]
b) only first and last removed: [border].Text.[border]
c) first, 2nd and last removed: [border]Text.[border]
d) ???
To me b) makes sense. However, a) is the HTML way and c) seems what
RenderX and AntennaHouse are doing. What do we want to do?
Having read that 1.1 definition more closely now, I'd say a). Somehow
it begins to fall into place...
This is a guess, but since an fo:block carries implicit line-breaks
with it, those spaces can both be dropped (unless white-space-
treatment="preserve", but we're talking default-values here). Since
the inline-area itself is the first and the last in the line-area,
the glyph-areas for the bounding spaces in the inline can be dropped/
ignored as well when it comes to text-alignment, borders, padding, etc.
Again: this is based on my own read of the 1.1 definition of white-
space-treatment (see previous post) as a property that has effects
both for XML whitespace characters --which could, in some cases,
already be ignored when parsing the input-- and glyph-areas
corresponding to fo:characters generating an XML whitespace character
in the output --this needs to wait until layout/line-building.
And what about this:
<fo:block>...A...<fo:inline
border="...">...Text ...</fo:inline>...B...</fo:block>
a) all removed: A[border]Text[border]B
b) only first and last removed: A.[border].Text.[border].B
c) only first and last removed and others collapsed across the
borders:
A.[border]Text.[border]B
d) ???
a) is most likely wrong, b) looks OK, c) is the HTML way.
Same thinking here, b) seems to be the way to go.
I wonder... what if:
1. as much as possible of the whitespace handling is done in the FO
parsing stage (before any LayoutManager is created)
2. after linefeed-treatment is handled, all remaining whitespace
characters are converted internally into fo:characters
This is precisely what the definition of fo:character seems to
prescribe for all characters, but that may be overkill (?)
As such, all those whitespace characters would get a default suppress-
at-line-break of "auto", meaning: for the plain old space --U+0020--
"suppress", and "retain" for all the others. So, in case of linefeed-
treatment="preserve":
<fo:block l-t="p">
</fo:block>
is the same as
<fo:block l-t="p"><fo:character character="
"
suppress-at-line-break="retain" .../></fo:block>
Which should IIC, in terms of layout, create something like a penalty
of -INFINITE (= effect should be a forced line-break), but the effect
of surrounding feasible breaks should be taken into account.
In case one is wondering: with default white-space-treatment and
preserved linefeeds, this means that if a linefeed glyph-area
immediately follows another line-break (start-block)...? Empty line
or not? The glyph-area is not deleted, and it should be the last area
of the line-area subset it occurs in, so I'm inclined to say: yes,
empty line.
Anyway, this would definitely mean something in terms of treating
whitespace consistently and uniformly, whether the stylesheet author
used explicit fo:characters or not. At the very least the treatment
between characters and fo:characters should be normalized in *some*
way vis-a-vis the layout-engine.
The other way around is certainly impossible, since we'd lose the
original fo:character's property info which is used during layout. In
between lies an idea of a temporary whitespace map, into which both
types of whitespace chars are stored as they are encountered (= {char
value, index in nodelist}). Normalize the map for a given FO element
when its full content is known, restructure the node accordingly,
removing superfluous whitespace characters, so that layout doesn't
even get to see them anymore.
<fo:block>   </fo:block>
would end up looking the same to the BlockLM as
a fo:block with three fo:character children with value " "
or a mixture between fo:character and  
or a fo:block with only one space character
or even... an empty block. Regular spaces are suppressed by default
in case of surrounding line-breaks, no?
Another funny one: if it were an fo:inline, at least one space would
have to remain, since it is unknown whether the inline will be first/
last in the line. Now, what in case you have three fo:characters with
value " " and they have different background-colors? First,
middle or last? :-)
Cheers,
Andreas