On Tue, 8 Nov 2005 04:40 am, Simon Pepping wrote: > I have taken my time, but here is my reaction to the Wiki page on > white space handling. In addition, I have written my own view on the > XSL-FO spec's handling of white space in a Wiki page.
Simon, your efforts are very much appreciated - at least by me. Your Wiki page presents white space handling at a different angle (paraphrased: Editors can modify the XML (adding spaces and linefeeds) and white space handling is mainly for dealing with those modifications). I think that is a very good perspective to take. > > Step 2. Refinement: white-space-collapse > ======================================== > > Issue 1. The spec intentionally addresses only XML white space, > because only such white space is manipulated by editors to obtain > pretty printing. Point taken, although I have no experience with non western editors. Do they all use 0x20 for 'pretty printing'? > > Issue 2. The spec intentionally addresses only the collapse of white > space around linefeed characters, because only such white space is > manipulated by editors to obtain pretty printing. Even if linefeed > characters indicate real line breaks and are preserved, it is > possible that the editor has introduced sequences of XML white space > characters for pretty printing. > OK > Issue 3. White-space-collapse is formulated in terms of space > characters which do not generate an area. That is similar to the > space resolution rules, where space specifiers get a zero width. > Since there is no merging of white space glyph areas into a single > area, there is no contradiction with the condition for glyph merging > in section 4.7.2. The space glyph area that does generate an area, > determines the traits of that area. > Yes - but I my point was if someone writes: <fo:character background-color="green" character="ሴ"><fo:character background-color="red" character="䌡"> and if ሴ and 䌡 are mergeable according to the rules of the script than we are not allowed to do so because they don't have matching traits. But if someone writes: <fo:character background-color="green" character=" "><fo:character background-color="red" character=" "> these would be removed / collapsed / deleted under the white space rules. Here is a more extreme example: <fo:character border="solid ...." character=" "> Under white space collapse the whole fo:character with the border disappears. If you write: <fo:inline border="solid ...."> </fo:inline> at least the border is retained and if the space survives depends on if the sequence is at the beginning or end of a line or not. Any way it is a bit academic as the spec is quite clear: if the Unicode value is U+0020 being it in a fo:character (during refinement) or a glyph area (during line building) it is subject to white space handling independent of any other properties / traits defined on it. > Step 3. Line building: white-space-treatment and > suppress-at-linebreak > ===================================================================== >= > > I agree that the references to the refinement stage are probably > editorial mistakes. > > Issue 1. As for white-space-collapse, the glyph areas are deleted, > and glyph merging is not applicable. > I agree with that interpretation - just not sure it really captures well what a user may expect - see examples above. > Issue 2. Here is a difference between FO 1.0 and 1.1. In 1.0 the flow > objects were deleted at the refinement stage. Therefore they cannot > contribute to line breaking. In 1.1 the glyph areas are deleted at > the line building stage. Therefore they could contribute to line > breaking. I do not think that this is intended, and they should not > contribute to line breaking. This is in line with my opinion that the > values preserve and ignore should not really be in the same property > as suppression around linebreaks, and should be taken care of in the > refinement stage. > Again I agree fully with you and the current implementation shows that issue. We deal with white-space-treatment twice once during refinement and once again during line building. Andreas commented on that as well. But I think that is how it has to be for the time being. > Example 2 > ========= > > The space in "<fo:block>.<fo:block>" is suppressed because it is at > the start of the block. Interesting - I agree that this is the intention but you don't find that sentence in the spec. In 1.1 this is covered by the "deleting spaces at the beginning of a line" under white-space-treatment / line building. Again the discussion is probably academic - we all agree what the expected outcome is. If we can derive that outcome from the spec or not is a very interesting discussion but won't change what we will do. > And "<fo:block><fo:block>" does not generate > an empty line. <fo:block> starts a new line, but that is not > equivalent to a linefeed. When at the start of the nested fo:block > there is no content in the line yet, it starts the same line. A > similar thing happens in the case of "</fo:block>
</fo:block>", > which was discussed in an email thread. I assume you mean the discussion under linefeed-treatment="preserve". I am still confused about that because </fo:block>
</fo:block> will generate one linefeed or should this create also none? > > Example 3 > ========= > > Jörg asked the same in this email thread: > http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED] >ache.org&by=thread&from=561781, entitled "Suppression of leading > space". > > <fo:block background-color="red" font-site="20pt"> > <fo:inline background-color="blue" font-site="10pt">foo > </fo:inline><fo:inline background-color="green" > font-site="15pt"> bar</fo:inline></fo:block> > > <fo:block background-color="red" font-site="20pt">. > ..<fo:inline background-color="blue" font-site="10pt">foo. > ..</fo:inline><fo:inline background-color="green". > ...font-site="15pt">.bar</fo:inline></fo:block> > > <fo:block background-color="red" font-site="20pt">. > <fo:inline background-color="blue" font-site="10pt">foo. > </fo:inline><fo:inline background-color="green" > font-site="15pt">.bar</fo:inline></fo:block> > > and also believes that two spaces remain. I think there is general agreement on this now. It may be helpful to review the test case block_white-space-collapse_1.xml and the generated PDF output. It demonstrates IMO how spaces are collapsed even across fo:inlines if they appear at the start/end of the line but preserved if surrounded by other text. I have attached the generated PDF. > > As to the border of the inline on the next line, I think indeed that > a formatter should avoid it, as it may be considered as a bad layout > choice. I agree, seems like we should treat a   after a starting border or before an ending border more like &#a0; or in terms of the Unicode line breaking algorithm we should treat borders like matching parenthesis because UAX#14 does not break something like "[ 1234 ]". > > Processing Model 2 > ================== > > In steps 2 and 3 you apply the conditions of glyph area merging. I do > not agree with that, as I explained above. > I will remove that - while I still think it may contradict user expectations it is what the spec seems to say. > In step 3 eligible characters are all characters with > suppress-at-line-break="true", by default only the space character. Agreed > > Nowhere in the spec is a conversion of tabs and CRs to spaces > specified. Under 7.15.8 it says: preserve Specifies that any character flow object whose character is classified, before any linefeed-treatment handling is considered, as white space in XML, except for U+000A (linefeed) characters, shall be converted during the refinement process into a character flow object whose Unicode code point is U+0020 (space). Why only for white-space-treatment="preserve" and not in other cases (e.g. on white-space-collapse on the last remaining white space) is beyond me in the moment. Again it seems everyone is doing this replacement always. So do all FOP versions I believe. > > In example 3, why is the space before 'Green' not deleted? It > directly follows a line break (step 4b). > Because it wouldn't meet the 'eligible white space' criteria. But as you want this concept abolished (and I agree with that) it will be deleted once I have updated the Wiki page. > Regards, Simon Thanks again Manuel
Description: Adobe PDF document