On Thu, 3 Nov 2005 05:56 pm, Manuel Mall wrote:
> On Wed, 2 Nov 2005 11:58 pm, Luca Furini wrote:
> > Manuel Mall wrote:
<snip/
> >
> > If we have two (or more) spaces, we could use the sequence:
> >
> > 1 glue w=endB&P
> > 2 penalty w=0
> > 3 glue w=(- endB&P - startB&P)
> > 4 glue w=spaceIPD1
> > 5 glue w=spaceIPD2
> > 6 box w=0
> > 7 infinite penalty
> > 8 glue w=startB&P
> >
> > total width = spaceIPD1 + spaceIPD2
> > if break at #2 = endB&P / startB&P
> >
> > Glues #4 and #5 have a Position pointing to different AreaInfo
> > objects (from different LMs). This should solve (?) the case of
> > ignore-if-surrounding.
>
> Excellent, because ignore-if-surrounding is the only case we have to
> consider. For formatter generated line breaks this is the same as
> ignore-if-after... and ignore-if-before... because we control the
> position of the line break we can logically position it such that for
> the before and after cases we can remove the spaces. Therefore IMO we
> don't need any other Knuth sequences.
>
> However, as these are "integrated sequences" we still have to carry
> info about this between LMs. This is "for further study" and
> suggestions are welcome.
>
Luca, as you are the expert on the Knuth sequences with respect to
break/space handling I think it would be good if we could document all
the different cases we have so far and envisage in the near future.
Here are some of the combinations I have identified:
1. Non breaking / non elastic space => probably just a normal character,
i.e. part of a word.
2. Non breaking / elastic space - eg. U+00A0 Non breaking space
=> Must prevent break
=> Must handle text-align
3. Break / non elastic - eg. U+200B ZWSP, any other break between two
characters not involving adding or removing space/characters
=> Must handle border/padding
=> Must handle text-align
4. Break / non elastic / remove if not break - eg. U+00AD Soft hyphen
=> Must remove if not at break
=> Must handle border/padding
=> Must handle text-align
5. Break / non elastic / add character if break - eg. hyphenation
=> Must add space for hyphen if at break
=> Must handle border/padding
=> Must handle text-align
6. Breaking / elastic / non removable - eg. U+3000 Ideographic space
=> Must handle border/padding
=> Must handle text-align
Question: XSL-FO does not define U+3000 as removable white space but
would under common CJK typesetting conventions this be removed at a
line break?
7. Breaking / elastic / removable - eg. U+0020 Space
=> Can occur in runs which must be wholly removed
=> Must handle border/padding
=> Must handle text-align
Any combinations I have missed, e.g. is there a "break / non elastic /
remove at break" case?
<snip/>
Regards
Manuel