Re: Leading/trailing space removal in LineLM

Luca Furini Wed, 02 Nov 2005 04:20:46 -0800

Manuel Mall wrote:

So we end up with only two cases to consider: preserve white space andremove white space around a line break created by the Knuth algorithm.
1. Preserve white space: IMO in this case the space itself is actuallynot a break opportunity but there are now two break opportunities: onebefore the space and one after the space. That is a sequence like'abc def' is more like 'abc def' or in a morereadable notation 'abc<zwsp><nbsp><zwsp>def'. That is our normal spacebecomes a non-breakable space flanked by zero-width spaces whichrepresent the break opportunities. If this is correct the Knuthelements would look like:
 glue w=0
 box w=0
 pen +INFINITE
 glue w=<space>
 pen
 glue w=0
Is this sequence correct? The first and last glue represent the <zwsp>and are break opportunities. The box prevents the removal of the spaceif a break is created before the space. The penalty prevents the spaceto be considered as a break opportunity.Of course as usual these sequences are further complicated in theabsence of justification and in the presence of border/padding.

I like your idea of "expanding" a preserved space into zwsps and nbsp;this allows us to forget alignments and borders / padding as we just haveto insert the appropriate elements for the non breaking space.


The sequence is very good, as it has a couple of interesting properties:

- it interacts with the surrounding elements just a single glue element

- if there are two (or more) consecutive, non-collapsed spaces thesequence has just 3 feasible breaks, not 4

However, I have a doubt: reading the Unicode document about line breaking,it seems to me that, regardless of the quantity of consecutive spaces,there is only *one* feasible break, after the last one (Unicode StandardAnnex #14, section 2 "Definitions", in particular the definition of"direct break" and "indirect break")


--- begin quoted text ---

Direct Break - a line break opportunity exists between two adjacentcharacters of the given line breaking classes. This is indicated in therules below as B ? A, where B is the character class of the characterbefore and A is the character class of the character after the break. Ifthey are separated by one or more space characters, a break opportunityalso exists after the last space. In the pair table, the optional spacecharacters are not shown.

Indirect Break - a line break opportunity exists between two characters ofthe given line breaking classes only if they are separated by one or morespaces. In this case, a break opportunity exists after the last space. Nobreak opportunity exists if the characters are immediately adjacent. Thisis indicated in the pair table below as B % A, where B is the characterclass of the character before and A is the character class of thecharacter after the break. Even though space characters are not shown inthe pair table, an indirect break can only occur if one or more spacesfollow B. In the notation of the rules in Section 6, Line BreakingAlgorithm this would be represented as two rules: B ? A and B SP+ ? A.


--- end quoted text ---

I still have not read the document from top to bottom, and I could havemisunderstood even the sections I read :-), but I think this point must beclarified before we continue.


Regards
    Luca

Re: Leading/trailing space removal in LineLM

Reply via email to