On Oct 28, 2005, at 12:28, Manuel Mall wrote:

On Fri, 28 Oct 2005 04:58 am, Andreas L Delmelle wrote:

(second example)

Same thinking here, b) seems to be the way to go.



We agree but did you notice the difference it would make in visual
appearance if the <inline> just happens to be at the beginning / end of
the line if we follow option a) from the first example? That is if we
have a line break after the A you would get:
[border]Text.[border].B
If we have a line break before the B you would get:
A.[border].Text[border].B

Errm, typo? I'd delete the space before 'B' as well, so fully:
A.[border].Text[border]
B

That is depending on where the linebreaks are there would be a space or
not between the border and the word 'Text'. It is these 'strange' or
'unsymmetric' outcomes which made me think that a border should
possibly act like a fence with respect to whitespace removal (option b)
in the first example).

In a certain way, yes. The white-spaces before 'A' and after 'B' can be literally removed from the stream (so that they don't have a corresponding glyph-area; why create one if you already know it's going to be deleted much further on?), while the other space- sequences can at most be collapsed to one space. These single spaces will automatically have glyph-areas which will or will not be deleted, depending on whether a line-break precedes/follows.

So I agree, but I don't think the borders need to be explicitly tracked/checked for this, as they coincide with the boundaries of the inline anyway. The effect of the border acting as a fence should more be seen as a consequence, following naturally from the process of whitespace handling. It's the element borders --but in XML markup terms, not the presence of border properties-- that act as 'fences' (quoted since the term is not really applicable at that level).

My main point was the difference between blocks and inlines in this respect.
For instance, the following different possibilities:

1) <fo:block> A ...
2) <fo:block>       A ...
3) <fo:block> &#x0A;A ...
4) <fo:block>&#x09;&#x0D;&#x0A;&#x20; A ...

would all be treated during layout as if they were
<fo:block>A ...

(supposing default values for all related properties)

The space between 'A' and '...' always remains --whether the '...' refers to content or markup for a nested inline-- but the spaces between the start-block markup and the character 'A' are all dropped (=implicit line-break immediately preceding).

Analogous for XML whitespace between the last non-whitespace char and the end-block markup.

For inlines, this becomes nearly the opposite: only if the current inline FO is the first child-node to its parent (first inline in a block, no preceding characters), then we could cheat and throw away any whitespace between start-inline and the first non-whitespace, but as a general ROT, those white-spaces can at most be collapsed to a single space, since they could end up in the middle of a line area.

Generally, inserting spaces, tabs or linefeeds as the first/last characters of a fo:block should make no difference, but for a fo:inline this would always result in an extra space in the output if it ends up in the middle of a line.

Talking nested blocks:

<fo:block> A <fo:block> B ...

is the same as

<fo:block>A<fo:block>B ...
or
<fo:block>A
  <fo:block>B

The above doesn't hold for nested inlines, hence: beware of indent="yes" in XSLT. In case of deeply nested inlines, this could result in the number of spaces in the output increasing with the depth of the fo:inline in the source document. :-)

<snip />
2. after linefeed-treatment is handled, all remaining whitespace
characters are converted internally into fo:characters

This is precisely what the definition of fo:character seems to
prescribe for all characters, but that may be overkill (?)

Yes, logically - practically within FOP no because creating separate FOs
and possibly areas for each character is most likely prohibitive in
terms of memory consumption and processing.
But logically FOP should behave as if that is what is happening.
Especially if we want to implement Unicode compliant line breaking,
bidi, etc. This needs to be done on a per paragraph basis and not
on a per 'text section' basis as is now. That is analysis where a
line break opportunity is must go across <inline> boundaries,
include <fo:characters>, etc..

Not necessarily separate FOs, but the same type of LayoutManager would probably be more in the right direction. CharLM (or subclass?) should be able to operate on either an attached fo:character or a simple char instance variable; instantiated either from an explicit fo:character object, or by the TextLM responsible for the larger context from a Unicode whitespace character it encounters (instead of creating the elements for whitespace itself, the TextLM instantiates a CharLM to delegate?) At the same time, the TextLM's operating context for line-breaking should indeed always be the full block/paragraph, instead of merely the text of the current inline. Maybe this could also be dealt with by passing state info from the parent's TextLM into the inline's own TextLM, so that it can use that to answer the question whether there is a legal break-opportunity before the inline. (ex.: last character before inline was of Unicode-class that prohibits linebreaks after, so an infinite penalty for breaking before...) It doesn't matter that much whether a break-opportunity is created. The most important thing is that the opportunity is given the appropriate degree of favorability, taking into account the constraints for Unicode line- breaking across FO element boundaries. The CharLM would deal with determining the value of suppress-at-line- break for its associated character (if Unicode whitespace), and generates an appropriate sequence of elements.

...or something like that?

Cheers,

Andreas

Reply via email to