Re: White space handling Wiki page

Andreas L Delmelle Fri, 28 Oct 2005 12:09:35 -0700

On Oct 28, 2005, at 12:28, Manuel Mall wrote:

On Fri, 28 Oct 2005 04:58 am, Andreas L Delmelle wrote:

(second example)

Same thinking here, b) seems to be the way to go.


We agree but did you notice the difference it would make in visual

appearance if the <inline> just happens to be at the beginning /end of

the line if we follow option a) from the first example? That is if we
have a line break after the A you would get:
[border]Text.[border].B
If we have a line break before the B you would get:
A.[border].Text[border].B


Errm, typo? I'd delete the space before 'B' as well, so fully:
A.[border].Text[border]
B

That is depending on where the linebreaks are there would be aspace or
not between the border and the word 'Text'. It is these 'strange' or
'unsymmetric' outcomes which made me think that a border should
possibly act like a fence with respect to whitespace removal(option b)
in the first example).

In a certain way, yes. The white-spaces before 'A' and after 'B' canbe literally removed from the stream (so that they don't have acorresponding glyph-area; why create one if you already know it'sgoing to be deleted much further on?), while the other space-sequences can at most be collapsed to one space. These single spaceswill automatically have glyph-areas which will or will not bedeleted, depending on whether a line-break precedes/follows.

So I agree, but I don't think the borders need to be explicitlytracked/checked for this, as they coincide with the boundaries ofthe inline anyway. The effect of the border acting as a fence shouldmore be seen as a consequence, following naturally from the processof whitespace handling. It's the element borders --but in XML markupterms, not the presence of border properties-- that act as'fences' (quoted since the term is not really applicable at that level).

My main point was the difference between blocks and inlines in thisrespect.

For instance, the following different possibilities:

1) <fo:block> A ...
2) <fo:block>       A ...
3) <fo:block> &#x0A;A ...
4) <fo:block>&#x09;&#x0D;&#x0A;&#x20; A ...

would all be treated during layout as if they were
<fo:block>A ...

(supposing default values for all related properties)

The space between 'A' and '...' always remains --whether the '...'refers to content or markup for a nested inline-- but the spacesbetween the start-block markup and the character 'A' are all dropped(=implicit line-break immediately preceding).

Analogous for XML whitespace between the last non-whitespace char andthe end-block markup.

For inlines, this becomes nearly the opposite: only if the currentinline FO is the first child-node to its parent (first inline in ablock, no preceding characters), then we could cheat and throw awayany whitespace between start-inline and the first non-whitespace, butas a general ROT, those white-spaces can at most be collapsed to asingle space, since they could end up in the middle of a line area.

Generally, inserting spaces, tabs or linefeeds as the first/lastcharacters of a fo:block should make no difference, but for afo:inline this would always result in an extra space in the output ifit ends up in the middle of a line.


Talking nested blocks:

<fo:block> A <fo:block> B ...

is the same as

<fo:block>A<fo:block>B ...
or
<fo:block>A
  <fo:block>B

The above doesn't hold for nested inlines, hence: beware ofindent="yes" in XSLT. In case of deeply nested inlines, this couldresult in the number of spaces in the output increasing with thedepth of the fo:inline in the source document. :-)


<snip />

2. after linefeed-treatment is handled, all remaining whitespace
characters are converted internally into fo:characters

This is precisely what the definition of fo:character seems to
prescribe for all characters, but that may be overkill (?)

Yes, logically - practically within FOP no because creatingseparate FOs

and possibly areas for each character is most likely prohibitive in
terms of memory consumption and processing.
But logically FOP should behave as if that is what is happening.
Especially if we want to implement Unicode compliant line breaking,
bidi, etc. This needs to be done on a per paragraph basis and not
on a per 'text section' basis as is now. That is analysis where a
line break opportunity is must go across <inline> boundaries,
include <fo:characters>, etc..

Not necessarily separate FOs, but the same type of LayoutManagerwould probably be more in the right direction. CharLM (or subclass?)should be able to operate on either an attached fo:character or asimple char instance variable; instantiated either from an explicitfo:character object, or by the TextLM responsible for the largercontext from a Unicode whitespace character it encounters (instead ofcreating the elements for whitespace itself, the TextLM instantiatesa CharLM to delegate?)At the same time, the TextLM's operating context for line-breakingshould indeed always be the full block/paragraph, instead of merelythe text of the current inline. Maybe this could also be dealt withby passing state info from the parent's TextLM into the inline's ownTextLM, so that it can use that to answer the question whether thereis a legal break-opportunity before the inline. (ex.: last characterbefore inline was of Unicode-class that prohibits linebreaks after,so an infinite penalty for breaking before...) It doesn't matter thatmuch whether a break-opportunity is created. The most important thingis that the opportunity is given the appropriate degree offavorability, taking into account the constraints for Unicode line-breaking across FO element boundaries.The CharLM would deal with determining the value of suppress-at-line-break for its associated character (if Unicode whitespace), andgenerates an appropriate sequence of elements.


...or something like that?

Cheers,

Andreas

Re: White space handling Wiki page

Reply via email to