On Oct 25, 2005, at 10:57, Manuel Mall wrote:

When FOP is collapsing (b) or removing (c) white space are there any
fences we need to observe. For example a border/padding between two
spaces, e.g. (spaces represented by a .):
<fo:block>...<fo:inline
border="...">...Text ...</fo:inline>...</fo:block>
There are 4 sequences of 3 spaces each. What would we expect the final
outcome to be (assuming it fits on one line):
a) all removed: [border]Text[border]
b) only first and last removed: [border].Text.[border]
c) first, 2nd and last removed: [border]Text.[border]
d) ???

To me b) makes sense. However, a) is the HTML way and c) seems what
RenderX and AntennaHouse are doing. What do we want to do?

Having read that 1.1 definition more closely now, I'd say a). Somehow it begins to fall into place...

This is a guess, but since an fo:block carries implicit line-breaks with it, those spaces can both be dropped (unless white-space- treatment="preserve", but we're talking default-values here). Since the inline-area itself is the first and the last in the line-area, the glyph-areas for the bounding spaces in the inline can be dropped/ ignored as well when it comes to text-alignment, borders, padding, etc.

Again: this is based on my own read of the 1.1 definition of white- space-treatment (see previous post) as a property that has effects both for XML whitespace characters --which could, in some cases, already be ignored when parsing the input-- and glyph-areas corresponding to fo:characters generating an XML whitespace character in the output --this needs to wait until layout/line-building.


And what about this:
<fo:block>...A...<fo:inline
border="...">...Text ...</fo:inline>...B...</fo:block>

a) all removed: A[border]Text[border]B
b) only first and last removed: A.[border].Text.[border].B
c) only first and last removed and others collapsed across the borders:
A.[border]Text.[border]B
d) ???

a) is most likely wrong, b) looks OK, c) is the HTML way.

Same thinking here, b) seems to be the way to go.

I wonder... what if:
1. as much as possible of the whitespace handling is done in the FO parsing stage (before any LayoutManager is created) 2. after linefeed-treatment is handled, all remaining whitespace characters are converted internally into fo:characters

This is precisely what the definition of fo:character seems to prescribe for all characters, but that may be overkill (?)

As such, all those whitespace characters would get a default suppress- at-line-break of "auto", meaning: for the plain old space --U+0020-- "suppress", and "retain" for all the others. So, in case of linefeed- treatment="preserve":

<fo:block l-t="p">&#x0A;</fo:block>

is the same as

<fo:block l-t="p"><fo:character character="&#x0A;"
   suppress-at-line-break="retain" .../></fo:block>

Which should IIC, in terms of layout, create something like a penalty of -INFINITE (= effect should be a forced line-break), but the effect of surrounding feasible breaks should be taken into account. In case one is wondering: with default white-space-treatment and preserved linefeeds, this means that if a linefeed glyph-area immediately follows another line-break (start-block)...? Empty line or not? The glyph-area is not deleted, and it should be the last area of the line-area subset it occurs in, so I'm inclined to say: yes, empty line.

Anyway, this would definitely mean something in terms of treating whitespace consistently and uniformly, whether the stylesheet author used explicit fo:characters or not. At the very least the treatment between characters and fo:characters should be normalized in *some* way vis-a-vis the layout-engine. The other way around is certainly impossible, since we'd lose the original fo:character's property info which is used during layout. In between lies an idea of a temporary whitespace map, into which both types of whitespace chars are stored as they are encountered (= {char value, index in nodelist}). Normalize the map for a given FO element when its full content is known, restructure the node accordingly, removing superfluous whitespace characters, so that layout doesn't even get to see them anymore.

<fo:block>&#x20;&#x20;&#x20;</fo:block>

would end up looking the same to the BlockLM as

a fo:block with three fo:character children with value " "
or a mixture between fo:character and &#x20;
or a fo:block with only one space character
or even... an empty block. Regular spaces are suppressed by default in case of surrounding line-breaks, no?

Another funny one: if it were an fo:inline, at least one space would have to remain, since it is unknown whether the inline will be first/ last in the line. Now, what in case you have three fo:characters with value "&#x20;" and they have different background-colors? First, middle or last? :-)


Cheers,

Andreas

Reply via email to