Andreas,

excellent - I think there is now lots of convergence and common 
understanding between your and my interpretations.

A bit more inline.

On Fri, 28 Oct 2005 04:58 am, Andreas L Delmelle wrote:
> On Oct 25, 2005, at 10:57, Manuel Mall wrote:
> > When FOP is collapsing (b) or removing (c) white space are there
> > any fences we need to observe. For example a border/padding between
> > two spaces, e.g. (spaces represented by a .):
> > <fo:block>...<fo:inline
> > border="...">...Text ...</fo:inline>...</fo:block>
> > There are 4 sequences of 3 spaces each. What would we expect the
> > final outcome to be (assuming it fits on one line):
> > a) all removed: [border]Text[border]
> > b) only first and last removed: [border].Text.[border]
> > c) first, 2nd and last removed: [border]Text.[border]
> > d) ???
> >
> > To me b) makes sense. However, a) is the HTML way and c) seems what
> > RenderX and AntennaHouse are doing. What do we want to do?
>
> Having read that 1.1 definition more closely now, I'd say a). Somehow
> it begins to fall into place...
>
I fully agree with your analysis of the spec here, that is saying option 
a) is the most likely answer that can be derived from the wording in 
the spec. Is it the intended / most sensible answer? A bit more below.

<snip/>
> > And what about this:
> > <fo:block>...A...<fo:inline
> > border="...">...Text ...</fo:inline>...B...</fo:block>
> >
> > a) all removed: A[border]Text[border]B
> > b) only first and last removed: A.[border].Text.[border].B
> > c) only first and last removed and others collapsed across the
> > borders:
> > A.[border]Text.[border]B
> > d) ???
> >
> > a) is most likely wrong, b) looks OK, c) is the HTML way.
>
> Same thinking here, b) seems to be the way to go.
>

We agree but did you notice the difference it would make in visual 
appearance if the <inline> just happens to be at the beginning / end of 
the line if we follow option a) from the first example? That is if we 
have a line break after the A you would get:
[border]Text.[border].B
If we have a line break before the B you would get:
A.[border].Text[border].B
That is depending on where the linebreaks are there would be a space or 
not between the border and the word 'Text'. It is these 'strange' or 
'unsymmetric' outcomes which made me think that a border should 
possibly act like a fence with respect to whitespace removal (option b) 
in the first example).

> I wonder... what if:
> 1. as much as possible of the whitespace handling is done in the FO
> parsing stage (before any LayoutManager is created)
Yes, isn't that what the refinement stage is about (partly)?
> 2. after linefeed-treatment is handled, all remaining whitespace
> characters are converted internally into fo:characters
>
> This is precisely what the definition of fo:character seems to
> prescribe for all characters, but that may be overkill (?)
Yes, logically - practically within FOP no because creating separate FOs 
and possibly areas for each character is most likely prohibitive in 
terms of memory consumption and processing. But logically FOP should 
behave as if that is what is happening. Especially if we want to 
implement Unicode compliant line breaking, bidi, etc. This needs to be 
done on a per paragraph basis and not on a per 'text section' basis as 
is now. That is analysis where a line break opportunity is must go 
across <inline> boundaries, include <fo:characters>, etc..

>
> As such, all those whitespace characters would get a default
> suppress- at-line-break of "auto", meaning: for the plain old space
> --U+0020-- "suppress", and "retain" for all the others. So, in case
> of linefeed- treatment="preserve":
>
> <fo:block l-t="p">&#x0A;</fo:block>
>
> is the same as
>
> <fo:block l-t="p"><fo:character character="&#x0A;"
>     suppress-at-line-break="retain" .../></fo:block>
>
> Which should IIC, in terms of layout, create something like a penalty
> of -INFINITE (= effect should be a forced line-break), but the effect
> of surrounding feasible breaks should be taken into account.
> In case one is wondering: with default white-space-treatment and
> preserved linefeeds, this means that if a linefeed glyph-area
> immediately follows another line-break (start-block)...? Empty line
> or not? The glyph-area is not deleted, and it should be the last area
> of the line-area subset it occurs in, so I'm inclined to say: yes,
> empty line.
Agree

>
> Anyway, this would definitely mean something in terms of treating
> whitespace consistently and uniformly, whether the stylesheet author
> used explicit fo:characters or not. At the very least the treatment
> between characters and fo:characters should be normalized in *some*
> way vis-a-vis the layout-engine.
Agree - It probably means when we operate on this stuff we need to use 
character iterators which go through 'plain text', nested inline, 
fo:characters, etc. transparently.

> The other way around is certainly impossible, since we'd lose the
> original fo:character's property info which is used during layout. In
> between lies an idea of a temporary whitespace map, into which both
> types of whitespace chars are stored as they are encountered (= {char
> value, index in nodelist}). Normalize the map for a given FO element
> when its full content is known, restructure the node accordingly,
> removing superfluous whitespace characters, so that layout doesn't
> even get to see them anymore.
>
> <fo:block>&#x20;&#x20;&#x20;</fo:block>
>
> would end up looking the same to the BlockLM as
>
> a fo:block with three fo:character children with value " "
> or a mixture between fo:character and &#x20;
> or a fo:block with only one space character
> or even... an empty block. Regular spaces are suppressed by default
> in case of surrounding line-breaks, no?
Yes - that's the outcome we need.

>
> Another funny one: if it were an fo:inline, at least one space would
> have to remain, since it is unknown whether the inline will be first/
> last in the line. Now, what in case you have three fo:characters with
> value "&#x20;" and they have different background-colors? First,
> middle or last? :-)
I think the spec is quite clear (pleasant change) on this when it 
describes whitespace collapse. It is described in terms of deletion 
last to first sibling (or right-to-left in l-r writing modes). This 
means the first (left most) white space survives.
>
>
> Cheers,
>
> Andreas

Cheers

Manuel

Reply via email to