Manuel Mall wrote:

Luca wrote a longer response to this but my mail reader doesn't like the character set (is that topical or what?).

Sorry, it looks really horrible ... still don't know what went wrong, but I won't do it again! :-)

Any way at end Luca ask the question about the UAX#14 line breaking algorithm and its handling of spaces. My answer to that is:
a) Yes UAX#14 always breaks at the of a sequence of spaces
b) But is also says that it assumes any trailing spaces in a line are being removed This "conflicts" with XSL-FO which can force spaces being retained therefore adjustments to the algorithm are necessary to cater for that. One possible adjustment is simply changing what is given to the algorithm as indicated above, ie <sp> becomes <zwsp><nbsp><zwsp>.

Ok, so back to your previous message:

2. Removal of white space: This is the current behaviour but it works
only for a single space and not for a sequence of spaces. Actually
because the algorithm removes leading glues/penalties it is mainly a
problem for trailing white space. I am not sure how to best tackle
this. What comes to mind is:

a) Do the same as for leading glues/penalties at the end of the line.
However I am not sure how tricky it would be to determine the boundary
because any 'blocking boxes' (see 1. above) are only placed
before but
not after elements. This options suffers from the problem that it will
not remove leading/trailing white space across inline boundaries with
border/padding as these generate zero width boxes to block removal of
the glue elements for the border/padding.

b) Do not generate individual Knuth sequences for each white space
character but instead collect all consecutive white space and create
one glue-penalty sequence for it. Again I am uncertain of the
consequences of doing that. To do that correctly we would need to
collect white space across inline boundaries. This firstly breaks the
current getNextKnuth approach which assumes each LM can generate its
sequences without knowledge of its neighbours. It would also break the
current area info structures as a single Knuth element could now refer
to text snippets from different LMs.

I'm not sure I follow you in all the details of white space handling and here we have borders too ... :-)

I like b) most: after all, this is somewhat similar to the space resolution, as we have interactions between spaces coming from different nodes, and it's difficult to have each LM decide on its own. And I think we could find a way to keep the 1-1 relationship between AreaInfo objects and Positions.

I have tried to play with the elements, and here are a few results: I hope they can help!

At the moments, the sequence for a single space with borders and padding is:

1  glue w=endB&P
2  penalty w=0
3  glue w=(spaceIPD - endB&P - startB&P)
4  box w=0
5  infinite penalty
6  glue w=startB&P

total width = spaceIPD
if break at #2 = endB&P / startB&P

If we have two (or more) spaces, we could use the sequence:

1  glue w=endB&P
2  penalty w=0
3  glue w=(- endB&P - startB&P)
4  glue w=spaceIPD1
5  glue w=spaceIPD2
6  box w=0
7  infinite penalty
8  glue w=startB&P

total width = spaceIPD1 + spaceIPD2
if break at #2 = endB&P / startB&P

Glues #4 and #5 have a Position pointing to different AreaInfo objects (from different LMs). This should solve (?) the case of ignore-if-surrounding.

If white-space-treatment is ignore-if-after, and we have two consecutive spaces we could use the sequence:

1  glue w=endB&P
2  penalty w=0
3  glue w=(spaceIPD - endB&P)
4  penalty w=0
5  glue w=(spaceIPD - startB&P)
6  box w=0
7  infinite penalty
8  glue w=startB&P

total width = 2 * spaceIPD
if break at #2 = endB&P / startB&P
if break at #4 = endB&P + spaceIPD / startB&P

With three or more consecutive spaces:
1  glue w=endB&P
2  penalty w=0
3  glue w=(spaceIPD - endB&P)
4  penalty w=0
5  glue w=spaceIPD
6  penalty w=0
7  glue w=(spaceIPD - startB&P)
8  box w=0
9  infinite penalty
10 glue w=startB&P

total width = 3 * spaceIPD
if break at #2 = endB&P / startB&P
if break at #4 = endB&P + spaceIPD / startB&P
if break at #6 = endB&P + 2 * spaceIPD / startB&P

I did not find a sequence for ignore-if-before yet ...

Regards
   Luca

Reply via email to