On Nov 3, 2005, at 08:53, Manuel Mall wrote:

On Thu, 3 Nov 2005 06:03 am, J.Pietschmann wrote:

Computing line breaking opportunities and discarding whitespace at
the end (or beginning) of a line are different matters. If whitespace
has to be retained, trailing spaces after a non-space string may
simply mean the previous line breaking opportunity has to be used,
because otherwise the string including the trailing spaces will
overflow the line area. The trailing whitespace may also influence
text justification.


Hmm, to me it appears that UNICODE and XSL-FO have slightly different
models when it comes to white space in the context of line breaking
which is causing the discussion here. In UNICODE everything is based
simply on the properties of the codepoint in question and its
neighbour. In XSL-FO one can change the behaviour of a codepoint by
setting those white space related XSL-FO properties.

Hmm, apart from suppress-at-line-break (which is a more general property, not specific wrt whitespace), the whitespace-related properties only deal with XML whitespace (which is obviously not the same as Unicode whitespace, but a very small subset thereof).

During refinement, all whitespace other than U+0020, U+0009, U+000D and U+000A is left alone. At that stage, it's only these four codepoints' behavior that can be influenced/changed by the three properties: white-space-treatment, linefeed-treatment and white-space-collapse. This means that a sequence of nbsp-space-zwsp-space-nbsp should arrive in layout untouched (never collapsed).

As Joerg points out, discarding whitespace at line-breaks and computing those line-breaks are two different issues. If I get the intention correctly, we shouldn't be following Unicode UAX#14 wherever it mentions white-space-removal/-retaining around eventual breaks except for non-XML-whitespace (as we implement a Recommendation that, at least from our POV, supersedes what Unicode says about this). We're using UAX#14 only to determine the feasible/ most desirable (non-)breaks.

If UAX#14 always breaks at the end of a sequence of spaces, then this tells us only that doing so would use the most desirable break- opportunity. If anything, it seems to make the job less complicated, because this means that we will practically never have to consider cases of whitespace following a line-break, no? Only in case of explicit linefeed-treatment="preserve"... Correct me if I'm wrong, but such a space sequence would correspond to a Knuth element sequence with the break-before penalty gradually increasing and the break-after penalty decreasing for each consecutive space, such that, when the decision has to be made where to break, the break-after for the last space will be chosen if possible. A break before a space is feasible, but not preferable to breaking after it, breaking after the first space should be marked less preferable than breaking after the last one. What happens with immediately preceding XML whitespace (or explicit fo:characters with overrides for default suppress-at-line-break), is then again determined by the white-space-treatment of the containing block. In this respect, the default rules are pretty simple: all glyph areas (or non-areas, which could still be relevant to possible FO extensions) for whitespace characters are retained, except regular spaces, or fo:characters with explicit "suppress".

That is not a concept within UNICODE. If you want to retain white space in UNICODE you use a different codepoint. If you want to retain a space in XSL-FO you could use a different codepoint but more likely you set a
XSL-FO property if you want this applied widely in your document.

If we want to 'marry' UNICODE linebreaking with XSL-FO white space
handling we have this interaction to consider. One possible solution
would be to replace spaces (U+0020) by different codepoints which
resemble the behaviour modification imposed by any XSL-FO white space
handling properties in effect.

Not really 'replace' but 'treat-as-if' (generate a Knuth sequence analogous to codepoint ...)


But I am not sure if this can be done in
all cases. Otherwise we may have to modify the UNICODE line breaking
algorithm to cater for the XSL-FO white space specialities.

Hm. Same as Joerg, I also see these two operate at different levels. Don't know if I'm seeing this correctly, but the line breaking algorithm operates at the level where we have to decide the value of the penalties for breaking before or after a particular type of whitespace (or non-whitespace), whereas the white-space treatment refers to line-building, so what happens/has to happen in case the break is 'elected'.

Cheers,

Andreas

Reply via email to