Re: Leading/trailing space removal in LineLM

Manuel Mall Thu, 03 Nov 2005 01:58:00 -0800

On Wed, 2 Nov 2005 11:58 pm, Luca Furini wrote:
> Manuel Mall wrote:
> > Luca wrote a longer response to this but my mail reader doesn't
> > like the character set (is that topical or what?).
>
> Sorry, it looks really horrible ... still don't know what went wrong,
> but I won't do it again! :-)
>
> > Any way at end Luca ask the question about the UAX#14 line breaking
> > algorithm and its handling of spaces. My answer to that is:
> > a) Yes UAX#14 always breaks at the of a sequence of spaces
> > b) But is also says that it assumes any trailing spaces in a line
> > are being removed
> > This "conflicts" with XSL-FO which can force spaces being retained
> > therefore adjustments to the algorithm are necessary to cater for
> > that. One possible adjustment is simply changing what is given to
> > the algorithm as indicated above, ie <sp> becomes
> > <zwsp><nbsp><zwsp>.
>
> Ok, so back to your previous message:
> > 2. Removal of white space: This is the current behaviour but it
> > works only for a single space and not for a sequence of spaces.
> > Actually because the algorithm removes leading glues/penalties it
> > is mainly a problem for trailing white space. I am not sure how to
> > best tackle this. What comes to mind is:
> >
> > a) Do the same as for leading glues/penalties at the end of the
> > line. However I am not sure how tricky it would be to determine the
> > boundary because any 'blocking boxes' (see 1. above) are only
> > placed before but
> > not after elements. This options suffers from the problem that it
> > will not remove leading/trailing white space across inline
> > boundaries with border/padding as these generate zero width boxes
> > to block removal of the glue elements for the border/padding.
> >
> > b) Do not generate individual Knuth sequences for each white space
> > character but instead collect all consecutive white space and
> > create one glue-penalty sequence for it. Again I am uncertain of
> > the consequences of doing that. To do that correctly we would need
> > to collect white space across inline boundaries. This firstly
> > breaks the current getNextKnuth approach which assumes each LM can
> > generate its sequences without knowledge of its neighbours. It
> > would also break the current area info structures as a single Knuth
> > element could now refer to text snippets from different LMs.
>
> I'm not sure I follow you in all the details of white space handling
> and here we have borders too ... :-)
>
> I like b) most: after all, this is somewhat similar to the space
> resolution, as we have interactions between spaces coming from
> different nodes, and it's difficult to have each LM decide on its
> own. And I think we could find a way to keep the 1-1 relationship
> between AreaInfo objects and Positions.
>
> I have tried to play with the elements, and here are a few results: I
> hope they can help!
>
> At the moments, the sequence for a single space with borders and
> padding is:
>
> 1  glue w=endB&P
> 2  penalty w=0
> 3  glue w=(spaceIPD - endB&P - startB&P)
> 4  box w=0
> 5  infinite penalty
> 6  glue w=startB&P
>
> total width = spaceIPD
> if break at #2 = endB&P / startB&P
>
> If we have two (or more) spaces, we could use the sequence:
>
> 1  glue w=endB&P
> 2  penalty w=0
> 3  glue w=(- endB&P - startB&P)
> 4  glue w=spaceIPD1
> 5  glue w=spaceIPD2
> 6  box w=0
> 7  infinite penalty
> 8  glue w=startB&P
>
> total width = spaceIPD1 + spaceIPD2
> if break at #2 = endB&P / startB&P
>
> Glues #4 and #5 have a Position pointing to different AreaInfo
> objects (from different LMs). This should solve (?) the case of
> ignore-if-surrounding.


Excellent, because ignore-if-surrounding is the only case we have to 
consider. For formatter generated line breaks this is the same as 
ignore-if-after... and ignore-if-before... because we control the 
position of the line break we can logically position it such that for 
the before and after cases we can remove the spaces. Therefore IMO we 
don't need any other Knuth sequences.

However, as these are "integrated sequences" we still have to carry info 
about this between LMs. This is "for further study" and suggestions are 
welcome.

>
> If white-space-treatment is ignore-if-after, and we have two
> consecutive spaces we could use the sequence:
>
> 1  glue w=endB&P
> 2  penalty w=0
> 3  glue w=(spaceIPD - endB&P)
> 4  penalty w=0
> 5  glue w=(spaceIPD - startB&P)
> 6  box w=0
> 7  infinite penalty
> 8  glue w=startB&P
>
> total width = 2 * spaceIPD
> if break at #2 = endB&P / startB&P
> if break at #4 = endB&P + spaceIPD / startB&P
>
> With three or more consecutive spaces:
> 1  glue w=endB&P
> 2  penalty w=0
> 3  glue w=(spaceIPD - endB&P)
> 4  penalty w=0
> 5  glue w=spaceIPD
> 6  penalty w=0
> 7  glue w=(spaceIPD - startB&P)
> 8  box w=0
> 9  infinite penalty
> 10 glue w=startB&P
>
> total width = 3 * spaceIPD
> if break at #2 = endB&P / startB&P
> if break at #4 = endB&P + spaceIPD / startB&P
> if break at #6 = endB&P + 2 * spaceIPD / startB&P
>
> I did not find a sequence for ignore-if-before yet ...
>
> Regards
>     Luca

Cheers

Manuel

PS: I finally feel there is real progress made in this white space 
handling stuff :-)

Re: Leading/trailing space removal in LineLM

Reply via email to