Re: Leading/trailing space removal in LineLM

Manuel Mall Tue, 01 Nov 2005 22:01:26 -0800

On Wed, 2 Nov 2005 04:18 am, Simon Pepping wrote:
> On Tue, Nov 01, 2005 at 11:40:42PM +0800, Manuel Mall wrote:
> > This is probably a question for Luca or Simon.
> >
<snip/>
> Glue and penalty items are removed at the start of a line. This is
> part of the Knuth algorithm. It does not touch the matter of
> white-space-collapse. If there is whitespace that may not be
> removed/collapsed at the start of the line, it must be protected by a
> preceding zero-width box. I.o.w., the value of white-space-collapse
> needs to be taken into account at the phase of getNextKnuthElements.
>
Fair enough - I need some help with the Knuth elements then.


During getNextKnuth we need to only consider white-space-treatment as 
white-space-collapse can be handled completely during refinement, that 
is consecutive sequences of white space are either collapsed or not 
during refinement.

We also can limit white-space-treatment during getNextKnuth to any line 
breaks generated by the line breaking algorithm (Knuth algorithm). 
white-space-treatment around hard line breaks (linefeeds, start/end of 
a block) are handled during refinement.

We can also limit white-space-treatment during getNextKnuth to the 
values "preserve" vs "ignore-if...". Other values are handled during 
refinement. We also can treat the three different "ignore-if..." 
values, that is the values: ignore-if-before-linefeed, 
ignore-if-after-linefeed, ignore-if-surrounding-linefeed, as just one 
case: 'delete all white space around a formatter generated break'.

So we end up with only two cases to consider: preserve white space and 
remove white space around a line break created by the Knuth algorithm.

1. Preserve white space: IMO in this case the space itself is actually 
not a break opportunity but there are now two break opportunities: one 
before the space and one after the space. That is a sequence like 
'abc&#x20;def' is more like 'abc&#x200b;&#xa0;&#x200b;def' or in a more 
readable notation 'abc<zwsp><nbsp><zwsp>def'. That is our normal space 
becomes a non-breakable space flanked by zero-width spaces which 
represent the break opportunities. If this is correct the Knuth 
elements would look like:
glue w=0
box w=0
pen +INFINITE
glue w=<space>
pen
glue w=0
Is this sequence correct? The first and last glue represent the <zwsp> 
and are break opportunities. The box prevents the removal of the space 
if a break is created before the space. The penalty prevents the space 
to be considered as a break opportunity.
Of course as usual these sequences are further complicated in the 
absence of justification and in the presence of border/padding.

2. Removal of white space: This is the current behaviour but it works 
only for a single space and not for a sequence of spaces. Actually 
because the algorithm removes leading glues/penalties it is mainly a 
problem for trailing white space. I am not sure how to best tackle 
this. What comes to mind is:

a) Do the same as for leading glues/penalties at the end of the line. 
However I am not sure how tricky it would be to determine the boundary 
because any 'blocking boxes' (see 1. above) are only placed before but 
not after elements. This options suffers from the problem that it will 
not remove leading/trailing white space across inline boundaries with 
border/padding as these generate zero width boxes to block removal of 
the glue elements for the border/padding.

b) Do not generate individual Knuth sequences for each white space 
character but instead collect all consecutive white space and create 
one glue-penalty sequence for it. Again I am uncertain of the 
consequences of doing that. To do that correctly we would need to 
collect white space across inline boundaries. This firstly breaks the 
current getNextKnuth approach which assumes each LM can generate its 
sequences without knowledge of its neighbours. It would also break the 
current area info structures as a single Knuth element could now refer 
to text snippets from different LMs.

Comments please.

> Simon
Thanks

Manuel

Re: Leading/trailing space removal in LineLM

Reply via email to