On Oct 28, 2005, at 12:28, Manuel Mall wrote:
On Fri, 28 Oct 2005 04:58 am, Andreas L Delmelle wrote:
(second example)
Same thinking here, b) seems to be the way to go.
We agree but did you notice the difference it would make in visual
appearance if the <inline> just happens to be at the beginning /
end of
the line if we follow option a) from the first example? That is if we
have a line break after the A you would get:
[border]Text.[border].B
If we have a line break before the B you would get:
A.[border].Text[border].B
Errm, typo? I'd delete the space before 'B' as well, so fully:
A.[border].Text[border]
B
That is depending on where the linebreaks are there would be a
space or
not between the border and the word 'Text'. It is these 'strange' or
'unsymmetric' outcomes which made me think that a border should
possibly act like a fence with respect to whitespace removal
(option b)
in the first example).
In a certain way, yes. The white-spaces before 'A' and after 'B' can
be literally removed from the stream (so that they don't have a
corresponding glyph-area; why create one if you already know it's
going to be deleted much further on?), while the other space-
sequences can at most be collapsed to one space. These single spaces
will automatically have glyph-areas which will or will not be
deleted, depending on whether a line-break precedes/follows.
So I agree, but I don't think the borders need to be explicitly
tracked/checked for this, as they coincide with the boundaries of
the inline anyway. The effect of the border acting as a fence should
more be seen as a consequence, following naturally from the process
of whitespace handling. It's the element borders --but in XML markup
terms, not the presence of border properties-- that act as
'fences' (quoted since the term is not really applicable at that level).
My main point was the difference between blocks and inlines in this
respect.
For instance, the following different possibilities:
1) <fo:block> A ...
2) <fo:block> A ...
3) <fo:block> 
A ...
4) <fo:block>	
  A ...
would all be treated during layout as if they were
<fo:block>A ...
(supposing default values for all related properties)
The space between 'A' and '...' always remains --whether the '...'
refers to content or markup for a nested inline-- but the spaces
between the start-block markup and the character 'A' are all dropped
(=implicit line-break immediately preceding).
Analogous for XML whitespace between the last non-whitespace char and
the end-block markup.
For inlines, this becomes nearly the opposite: only if the current
inline FO is the first child-node to its parent (first inline in a
block, no preceding characters), then we could cheat and throw away
any whitespace between start-inline and the first non-whitespace, but
as a general ROT, those white-spaces can at most be collapsed to a
single space, since they could end up in the middle of a line area.
Generally, inserting spaces, tabs or linefeeds as the first/last
characters of a fo:block should make no difference, but for a
fo:inline this would always result in an extra space in the output if
it ends up in the middle of a line.
Talking nested blocks:
<fo:block> A <fo:block> B ...
is the same as
<fo:block>A<fo:block>B ...
or
<fo:block>A
<fo:block>B
The above doesn't hold for nested inlines, hence: beware of
indent="yes" in XSLT. In case of deeply nested inlines, this could
result in the number of spaces in the output increasing with the
depth of the fo:inline in the source document. :-)
<snip />
2. after linefeed-treatment is handled, all remaining whitespace
characters are converted internally into fo:characters
This is precisely what the definition of fo:character seems to
prescribe for all characters, but that may be overkill (?)
Yes, logically - practically within FOP no because creating
separate FOs
and possibly areas for each character is most likely prohibitive in
terms of memory consumption and processing.
But logically FOP should behave as if that is what is happening.
Especially if we want to implement Unicode compliant line breaking,
bidi, etc. This needs to be done on a per paragraph basis and not
on a per 'text section' basis as is now. That is analysis where a
line break opportunity is must go across <inline> boundaries,
include <fo:characters>, etc..
Not necessarily separate FOs, but the same type of LayoutManager
would probably be more in the right direction. CharLM (or subclass?)
should be able to operate on either an attached fo:character or a
simple char instance variable; instantiated either from an explicit
fo:character object, or by the TextLM responsible for the larger
context from a Unicode whitespace character it encounters (instead of
creating the elements for whitespace itself, the TextLM instantiates
a CharLM to delegate?)
At the same time, the TextLM's operating context for line-breaking
should indeed always be the full block/paragraph, instead of merely
the text of the current inline. Maybe this could also be dealt with
by passing state info from the parent's TextLM into the inline's own
TextLM, so that it can use that to answer the question whether there
is a legal break-opportunity before the inline. (ex.: last character
before inline was of Unicode-class that prohibits linebreaks after,
so an infinite penalty for breaking before...) It doesn't matter that
much whether a break-opportunity is created. The most important thing
is that the opportunity is given the appropriate degree of
favorability, taking into account the constraints for Unicode line-
breaking across FO element boundaries.
The CharLM would deal with determining the value of suppress-at-line-
break for its associated character (if Unicode whitespace), and
generates an appropriate sequence of elements.
...or something like that?
Cheers,
Andreas