Re: Leading/trailing space removal in LineLM

Luca Furini Wed, 02 Nov 2005 07:58:41 -0800

Manuel Mall wrote:

Luca wrote a longer response to this but my mail reader doesn't like thecharacter set (is that topical or what?).

Sorry, it looks really horrible ... still don't know what went wrong, butI won't do it again! :-)

Any way at end Luca ask the question about the UAX#14 line breakingalgorithm and its handling of spaces. My answer to that is:
a) Yes UAX#14 always breaks at the of a sequence of spaces
b) But is also says that it assumes any trailing spaces in a line arebeing removedThis "conflicts" with XSL-FO which can force spaces being retainedtherefore adjustments to the algorithm are necessary to cater for that.One possible adjustment is simply changing what is given to thealgorithm as indicated above, ie <sp> becomes <zwsp><nbsp><zwsp>.


Ok, so back to your previous message:

2. Removal of white space: This is the current behaviour but it works
only for a single space and not for a sequence of spaces. Actually
because the algorithm removes leading glues/penalties it is mainly a
problem for trailing white space. I am not sure how to best tackle
this. What comes to mind is:

a) Do the same as for leading glues/penalties at the end of the line.
However I am not sure how tricky it would be to determine the boundary
because any 'blocking boxes' (see 1. above) are only placed
before but
not after elements. This options suffers from the problem that it will
not remove leading/trailing white space across inline boundaries with
border/padding as these generate zero width boxes to block removal of
the glue elements for the border/padding.

b) Do not generate individual Knuth sequences for each white space
character but instead collect all consecutive white space and create
one glue-penalty sequence for it. Again I am uncertain of the
consequences of doing that. To do that correctly we would need to
collect white space across inline boundaries. This firstly breaks the
current getNextKnuth approach which assumes each LM can generate its
sequences without knowledge of its neighbours. It would also break the
current area info structures as a single Knuth element could now refer
to text snippets from different LMs.

I'm not sure I follow you in all the details of white space handling andhere we have borders too ... :-)

I like b) most: after all, this is somewhat similar to the spaceresolution, as we have interactions between spaces coming from differentnodes, and it's difficult to have each LM decide on its own. And I thinkwe could find a way to keep the 1-1 relationship between AreaInfo objectsand Positions.

I have tried to play with the elements, and here are a few results: I hopethey can help!

At the moments, the sequence for a single space with borders and paddingis:


1  glue w=endB&P
2  penalty w=0
3  glue w=(spaceIPD - endB&P - startB&P)
4  box w=0
5  infinite penalty
6  glue w=startB&P

total width = spaceIPD
if break at #2 = endB&P / startB&P

If we have two (or more) spaces, we could use the sequence:

1  glue w=endB&P
2  penalty w=0
3  glue w=(- endB&P - startB&P)
4  glue w=spaceIPD1
5  glue w=spaceIPD2
6  box w=0
7  infinite penalty
8  glue w=startB&P

total width = spaceIPD1 + spaceIPD2
if break at #2 = endB&P / startB&P

Glues #4 and #5 have a Position pointing to different AreaInfo objects(from different LMs). This should solve (?) the case ofignore-if-surrounding.

If white-space-treatment is ignore-if-after, and we have two consecutivespaces we could use the sequence:


1  glue w=endB&P
2  penalty w=0
3  glue w=(spaceIPD - endB&P)
4  penalty w=0
5  glue w=(spaceIPD - startB&P)
6  box w=0
7  infinite penalty
8  glue w=startB&P

total width = 2 * spaceIPD
if break at #2 = endB&P / startB&P
if break at #4 = endB&P + spaceIPD / startB&P

With three or more consecutive spaces:
1  glue w=endB&P
2  penalty w=0
3  glue w=(spaceIPD - endB&P)
4  penalty w=0
5  glue w=spaceIPD
6  penalty w=0
7  glue w=(spaceIPD - startB&P)
8  box w=0
9  infinite penalty
10 glue w=startB&P

total width = 3 * spaceIPD
if break at #2 = endB&P / startB&P
if break at #4 = endB&P + spaceIPD / startB&P
if break at #6 = endB&P + 2 * spaceIPD / startB&P

I did not find a sequence for ignore-if-before yet ...

Regards
   Luca

Re: Leading/trailing space removal in LineLM

Reply via email to