On Wed, 2 Nov 2005 04:18 am, Simon Pepping wrote: > On Tue, Nov 01, 2005 at 11:40:42PM +0800, Manuel Mall wrote: > > This is probably a question for Luca or Simon. > > <snip/> > Glue and penalty items are removed at the start of a line. This is > part of the Knuth algorithm. It does not touch the matter of > white-space-collapse. If there is whitespace that may not be > removed/collapsed at the start of the line, it must be protected by a > preceding zero-width box. I.o.w., the value of white-space-collapse > needs to be taken into account at the phase of getNextKnuthElements. > Fair enough - I need some help with the Knuth elements then.
During getNextKnuth we need to only consider white-space-treatment as white-space-collapse can be handled completely during refinement, that is consecutive sequences of white space are either collapsed or not during refinement. We also can limit white-space-treatment during getNextKnuth to any line breaks generated by the line breaking algorithm (Knuth algorithm). white-space-treatment around hard line breaks (linefeeds, start/end of a block) are handled during refinement. We can also limit white-space-treatment during getNextKnuth to the values "preserve" vs "ignore-if...". Other values are handled during refinement. We also can treat the three different "ignore-if..." values, that is the values: ignore-if-before-linefeed, ignore-if-after-linefeed, ignore-if-surrounding-linefeed, as just one case: 'delete all white space around a formatter generated break'. So we end up with only two cases to consider: preserve white space and remove white space around a line break created by the Knuth algorithm. 1. Preserve white space: IMO in this case the space itself is actually not a break opportunity but there are now two break opportunities: one before the space and one after the space. That is a sequence like 'abc def' is more like 'abc​ ​def' or in a more readable notation 'abc<zwsp><nbsp><zwsp>def'. That is our normal space becomes a non-breakable space flanked by zero-width spaces which represent the break opportunities. If this is correct the Knuth elements would look like: glue w=0 box w=0 pen +INFINITE glue w=<space> pen glue w=0 Is this sequence correct? The first and last glue represent the <zwsp> and are break opportunities. The box prevents the removal of the space if a break is created before the space. The penalty prevents the space to be considered as a break opportunity. Of course as usual these sequences are further complicated in the absence of justification and in the presence of border/padding. 2. Removal of white space: This is the current behaviour but it works only for a single space and not for a sequence of spaces. Actually because the algorithm removes leading glues/penalties it is mainly a problem for trailing white space. I am not sure how to best tackle this. What comes to mind is: a) Do the same as for leading glues/penalties at the end of the line. However I am not sure how tricky it would be to determine the boundary because any 'blocking boxes' (see 1. above) are only placed before but not after elements. This options suffers from the problem that it will not remove leading/trailing white space across inline boundaries with border/padding as these generate zero width boxes to block removal of the glue elements for the border/padding. b) Do not generate individual Knuth sequences for each white space character but instead collect all consecutive white space and create one glue-penalty sequence for it. Again I am uncertain of the consequences of doing that. To do that correctly we would need to collect white space across inline boundaries. This firstly breaks the current getNextKnuth approach which assumes each LM can generate its sequences without knowledge of its neighbours. It would also break the current area info structures as a single Knuth element could now refer to text snippets from different LMs. Comments please. > Simon Thanks Manuel
