On Tue, 8 Nov 2005 04:40 am, Simon Pepping wrote:
> I have taken my time, but here is my reaction to the Wiki page on
> white space handling. In addition, I have written my own view on the
> XSL-FO spec's handling of white space in a Wiki page.

Simon,

your efforts are very much appreciated - at least by me. Your Wiki page 
presents white space handling at a different angle (paraphrased: 
Editors can modify the XML (adding spaces and linefeeds) and white 
space handling is mainly for dealing with those modifications). I think 
that is a very good perspective to take. 

>
> Step 2. Refinement: white-space-collapse
> ========================================
>
> Issue 1. The spec intentionally addresses only XML white space,
> because only such white space is manipulated by editors to obtain
> pretty printing.

Point taken, although I have no experience with non western editors. Do 
they all use 0x20 for 'pretty printing'?

>
> Issue 2. The spec intentionally addresses only the collapse of white
> space around linefeed characters, because only such white space is
> manipulated by editors to obtain pretty printing. Even if linefeed
> characters indicate real line breaks and are preserved, it is
> possible that the editor has introduced sequences of XML white space
> characters for pretty printing.
>

OK

> Issue 3. White-space-collapse is formulated in terms of space
> characters which do not generate an area. That is similar to the
> space resolution rules, where space specifiers get a zero width.
> Since there is no merging of white space glyph areas into a single
> area, there is no contradiction with the condition for glyph merging
> in section 4.7.2. The space glyph area that does generate an area,
> determines the traits of that area.
>
Yes - but I my point was if someone writes:
<fo:character background-color="green" 
character="&#x1234;"><fo:character background-color="red" 
character="&#x4321;">
and if &#x1234; and &#x4321; are mergeable according to the rules of the 
script than we are not allowed to do so because they don't have 
matching traits. But if someone writes:
<fo:character background-color="green" character="&#x20;"><fo:character 
background-color="red" character="&#x20;">
these would be removed / collapsed / deleted under the white space 
rules.

Here is a more extreme example:
&#20;<fo:character border="solid ...." character="&#x20;">
Under white space collapse the whole fo:character with the border 
disappears. If you write:
&#20;<fo:inline border="solid ....">&#x20;</fo:inline>
at least the border is retained and if the space survives depends on if 
the sequence is at the beginning or end of a line or not.

Any way it is a bit academic as the spec is quite clear: if the Unicode 
value is U+0020 being it in a fo:character (during refinement) or a 
glyph area (during line building) it is subject to  white space 
handling independent of any other properties / traits defined on it.

> Step 3. Line building: white-space-treatment and
> suppress-at-linebreak
> =====================================================================
>=
>
> I agree that the references to the refinement stage are probably
> editorial mistakes.
>
> Issue 1. As for white-space-collapse, the glyph areas are deleted,
> and glyph merging is not applicable.
>
I agree with that interpretation - just not sure it really captures well 
what a user may expect - see examples above.

> Issue 2. Here is a difference between FO 1.0 and 1.1. In 1.0 the flow
> objects were deleted at the refinement stage. Therefore they cannot
> contribute to line breaking. In 1.1 the glyph areas are deleted at
> the line building stage. Therefore they could contribute to line
> breaking. I do not think that this is intended, and they should not
> contribute to line breaking. This is in line with my opinion that the
> values preserve and ignore should not really be in the same property
> as suppression around linebreaks, and should be taken care of in the
> refinement stage.
>
Again I agree fully with you and the current implementation shows that 
issue. We deal with white-space-treatment twice once during refinement 
and once again during line building. Andreas commented on that as well. 
But I think that is how it has to be for the time being.

> Example 2
> =========
>
> The space in "<fo:block>.<fo:block>" is suppressed because it is at
> the start of the block. 
Interesting - I agree that this is the intention but you don't find that 
sentence in the spec. In 1.1 this is covered by the "deleting spaces at 
the beginning of a line" under white-space-treatment / line building. 
Again the discussion is probably academic - we all agree what the 
expected outcome is. If we can derive that outcome from the spec or not 
is a very interesting discussion but won't change what we will do.

> And "<fo:block><fo:block>" does not generate 
> an empty line. <fo:block> starts a new line, but that is not
> equivalent to a linefeed. When at the start of the nested fo:block
> there is no content in the line yet, it starts the same line. A
> similar thing happens in the case of "</fo:block>&#x0A;</fo:block>",
> which was discussed in an email thread.
I assume you mean the discussion under linefeed-treatment="preserve". I 
am still confused about that because
</fo:block>&#x0A;&#x0A;</fo:block> 
will generate one linefeed or should this create also none?

>
> Example 3
> =========
>
> Jörg asked the same in this email thread:
> http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]
>ache.org&by=thread&from=561781, entitled "Suppression of leading
> space".
>
> <fo:block background-color="red" font-site="20pt">
>   <fo:inline background-color="blue" font-site="10pt">foo
>   </fo:inline><fo:inline background-color="green"
>    font-site="15pt"> bar</fo:inline></fo:block>
>
> <fo:block background-color="red" font-site="20pt">.
> ..<fo:inline background-color="blue" font-site="10pt">foo.
> ..</fo:inline><fo:inline background-color="green".
> ...font-site="15pt">.bar</fo:inline></fo:block>
>
> <fo:block background-color="red" font-site="20pt">.
> <fo:inline background-color="blue" font-site="10pt">foo.
> </fo:inline><fo:inline background-color="green"
> font-site="15pt">.bar</fo:inline></fo:block>
>
> and also believes that two spaces remain.

I think there is general agreement on this now. It may be helpful to 
review the test case block_white-space-collapse_1.xml and the generated 
PDF output. It demonstrates IMO how spaces are collapsed even across 
fo:inlines if they appear at the start/end of the line but preserved if 
surrounded by other text. I have attached the generated PDF.

>
> As to the border of the inline on the next line, I think indeed that
> a formatter should avoid it, as it may be considered as a bad layout
> choice.

I agree, seems like we should treat a &#x20; after a starting border or 
before an ending border more like &#a0; or in terms of the Unicode line 
breaking algorithm we should treat borders like matching parenthesis 
because UAX#14 does not break something like "[ 1234 ]".

>
> Processing Model 2
> ==================
>
> In steps 2 and 3 you apply the conditions of glyph area merging. I do
> not agree with that, as I explained above.
>
I will remove that - while I still think it may contradict user 
expectations it is what the spec seems to say.

> In step 3 eligible characters are all characters with
> suppress-at-line-break="true", by default only the space character.
Agreed

>
> Nowhere in the spec is a conversion of tabs and CRs to spaces
> specified.
Under 7.15.8 it says:

preserve

    Specifies that any character flow object whose character is 
classified, before any linefeed-treatment handling is considered, as 
white space in XML, except for U+000A (linefeed) characters, shall be 
converted during the refinement process into a character flow object 
whose Unicode code point is U+0020 (space).

Why only for white-space-treatment="preserve" and not in other cases 
(e.g. on white-space-collapse on the last remaining white space) is 
beyond me in the moment. Again it seems everyone is doing this 
replacement always. So do all FOP versions I believe.

>
> In example 3, why is the space before 'Green' not deleted? It
> directly follows a line break (step 4b).
>
Because it wouldn't meet the 'eligible white space' criteria. But as you 
want this concept abolished (and I agree with that) it will be deleted 
once I have updated the Wiki page.

> Regards, Simon

Thanks again
Manuel

Attachment: block_white-space_1.xml.pdf
Description: Adobe PDF document

Reply via email to