On Nov 3, 2005, at 08:53, Manuel Mall wrote:
On Thu, 3 Nov 2005 06:03 am, J.Pietschmann wrote:
Computing line breaking opportunities and discarding whitespace at
the end (or beginning) of a line are different matters. If whitespace
has to be retained, trailing spaces after a non-space string may
simply mean the previous line breaking opportunity has to be used,
because otherwise the string including the trailing spaces will
overflow the line area. The trailing whitespace may also influence
Hmm, to me it appears that UNICODE and XSL-FO have slightly different
models when it comes to white space in the context of line breaking
which is causing the discussion here. In UNICODE everything is based
simply on the properties of the codepoint in question and its
neighbour. In XSL-FO one can change the behaviour of a codepoint by
setting those white space related XSL-FO properties.
Hmm, apart from suppress-at-line-break (which is a more general
property, not specific wrt whitespace), the whitespace-related
properties only deal with XML whitespace (which is obviously not the
same as Unicode whitespace, but a very small subset thereof).
During refinement, all whitespace other than U+0020, U+0009, U+000D
and U+000A is left alone.
At that stage, it's only these four codepoints' behavior that can be
influenced/changed by the three properties: white-space-treatment,
linefeed-treatment and white-space-collapse.
This means that a sequence of nbsp-space-zwsp-space-nbsp should
arrive in layout untouched (never collapsed).
As Joerg points out, discarding whitespace at line-breaks and
computing those line-breaks are two different issues.
If I get the intention correctly, we shouldn't be following Unicode
UAX#14 wherever it mentions white-space-removal/-retaining around
eventual breaks except for non-XML-whitespace (as we implement a
Recommendation that, at least from our POV, supersedes what Unicode
says about this). We're using UAX#14 only to determine the feasible/
most desirable (non-)breaks.
If UAX#14 always breaks at the end of a sequence of spaces, then this
tells us only that doing so would use the most desirable break-
opportunity. If anything, it seems to make the job less complicated,
because this means that we will practically never have to consider
cases of whitespace following a line-break, no? Only in case of
Correct me if I'm wrong, but such a space sequence would correspond
to a Knuth element sequence with the break-before penalty gradually
increasing and the break-after penalty decreasing for each
consecutive space, such that, when the decision has to be made where
to break, the break-after for the last space will be chosen if
possible. A break before a space is feasible, but not preferable to
breaking after it, breaking after the first space should be marked
less preferable than breaking after the last one.
What happens with immediately preceding XML whitespace (or explicit
fo:characters with overrides for default suppress-at-line-break), is
then again determined by the white-space-treatment of the containing
block. In this respect, the default rules are pretty simple: all
glyph areas (or non-areas, which could still be relevant to possible
FO extensions) for whitespace characters are retained, except regular
spaces, or fo:characters with explicit "suppress".
That is not a concept within UNICODE. If you want to retain white
in UNICODE you use a different codepoint. If you want to retain a
in XSL-FO you could use a different codepoint but more likely you
XSL-FO property if you want this applied widely in your document.
If we want to 'marry' UNICODE linebreaking with XSL-FO white space
handling we have this interaction to consider. One possible solution
would be to replace spaces (U+0020) by different codepoints which
resemble the behaviour modification imposed by any XSL-FO white space
handling properties in effect.
Not really 'replace' but 'treat-as-if' (generate a Knuth sequence
analogous to codepoint ...)
But I am not sure if this can be done in
all cases. Otherwise we may have to modify the UNICODE line breaking
algorithm to cater for the XSL-FO white space specialities.
Hm. Same as Joerg, I also see these two operate at different levels.
Don't know if I'm seeing this correctly, but the line breaking
algorithm operates at the level where we have to decide the value of
the penalties for breaking before or after a particular type of
whitespace (or non-whitespace), whereas the white-space treatment
refers to line-building, so what happens/has to happen in case the
break is 'elected'.