Re: Leading/trailing space removal in LineLM

Andreas L Delmelle Thu, 03 Nov 2005 14:26:22 -0800

On Nov 3, 2005, at 08:53, Manuel Mall wrote:

On Thu, 3 Nov 2005 06:03 am, J.Pietschmann wrote:

Computing line breaking opportunities and discarding whitespace at
the end (or beginning) of a line are different matters. If whitespace
has to be retained, trailing spaces after a non-space string may
simply mean the previous line breaking opportunity has to be used,
because otherwise the string including the trailing spaces will
overflow the line area. The trailing whitespace may also influence
text justification.

Hmm, to me it appears that UNICODE and XSL-FO have slightly different
models when it comes to white space in the context of line breaking
which is causing the discussion here. In UNICODE everything is based
simply on the properties of the codepoint in question and its
neighbour. In XSL-FO one can change the behaviour of a codepoint by
setting those white space related XSL-FO properties.

Hmm, apart from suppress-at-line-break (which is a more generalproperty, not specific wrt whitespace), the whitespace-relatedproperties only deal with XML whitespace (which is obviously not thesame as Unicode whitespace, but a very small subset thereof).

During refinement, all whitespace other than U+0020, U+0009, U+000Dand U+000A is left alone.At that stage, it's only these four codepoints' behavior that can beinfluenced/changed by the three properties: white-space-treatment,linefeed-treatment and white-space-collapse.This means that a sequence of nbsp-space-zwsp-space-nbsp shouldarrive in layout untouched (never collapsed).

As Joerg points out, discarding whitespace at line-breaks andcomputing those line-breaks are two different issues.If I get the intention correctly, we shouldn't be following UnicodeUAX#14 wherever it mentions white-space-removal/-retaining aroundeventual breaks except for non-XML-whitespace (as we implement aRecommendation that, at least from our POV, supersedes what Unicodesays about this). We're using UAX#14 only to determine the feasible/most desirable (non-)breaks.

If UAX#14 always breaks at the end of a sequence of spaces, then thistells us only that doing so would use the most desirable break-opportunity. If anything, it seems to make the job less complicated,because this means that we will practically never have to considercases of whitespace following a line-break, no? Only in case ofexplicit linefeed-treatment="preserve"...Correct me if I'm wrong, but such a space sequence would correspondto a Knuth element sequence with the break-before penalty graduallyincreasing and the break-after penalty decreasing for eachconsecutive space, such that, when the decision has to be made whereto break, the break-after for the last space will be chosen ifpossible. A break before a space is feasible, but not preferable tobreaking after it, breaking after the first space should be markedless preferable than breaking after the last one.What happens with immediately preceding XML whitespace (or explicitfo:characters with overrides for default suppress-at-line-break), isthen again determined by the white-space-treatment of the containingblock. In this respect, the default rules are pretty simple: allglyph areas (or non-areas, which could still be relevant to possibleFO extensions) for whitespace characters are retained, except regularspaces, or fo:characters with explicit "suppress".

That is not a concept within UNICODE. If you want to retain whitespacein UNICODE you use a different codepoint. If you want to retain aspacein XSL-FO you could use a different codepoint but more likely youset a
XSL-FO property if you want this applied widely in your document.

If we want to 'marry' UNICODE linebreaking with XSL-FO white space
handling we have this interaction to consider. One possible solution
would be to replace spaces (U+0020) by different codepoints which
resemble the behaviour modification imposed by any XSL-FO white space
handling properties in effect.

Not really 'replace' but 'treat-as-if' (generate a Knuth sequenceanalogous to codepoint ...)

But I am not sure if this can be done in
all cases. Otherwise we may have to modify the UNICODE line breaking
algorithm to cater for the XSL-FO white space specialities.

Hm. Same as Joerg, I also see these two operate at different levels.Don't know if I'm seeing this correctly, but the line breakingalgorithm operates at the level where we have to decide the value ofthe penalties for breaking before or after a particular type ofwhitespace (or non-whitespace), whereas the white-space treatmentrefers to line-building, so what happens/has to happen in case thebreak is 'elected'.


Cheers,

Andreas

Re: Leading/trailing space removal in LineLM

Reply via email to