https://issues.apache.org/bugzilla/show_bug.cgi?id=45097





--- Comment #14 from Andreas L. Delmelle <[EMAIL PROTECTED]>  2008-11-25 
13:12:37 PST ---
(In reply to comment #12)

Sorry to chime in so late...

> Based on my novice analysis, it appears the various KnuthElements provide the
> following purposes:
<snip />

Entirely correct interpretation.

A box is never a break-possibility, unless when preceded by a penalty
indicating one. Glues are always a break-possibility, unless when preceded by a
penalty prohibiting one. That's the general idea. 
If a glue simply appears in between two boxes, then when it is chosen as the
effective break, it dissolves. To generate the effect of preserved spaces or
account for alignment other than "justify", one needs a sequence of those
elements to represent the different effects (break/no-break).
If a glue is followed by a glue, then the latter becomes the more favorable
break. The former could then simply be discarded as a possibility.

> This matches what Andreas shows as the sequence for a preserved space (glue,
> penalty=0, glue, aux. box w=0, penalty=inf, glue).  Is my analysis of each
> KnuthElement and the purpose it serves correct?  I still don't understand how
> it gets the stretch values that it does, 

A point which has been put into question recently: 10008 is exactly the width
of 3 normal spaces, indeed to handle alignment other than "justify", but it has
been proven to have nasty side-effects for long blocks with a relatively small
line-width (multi-column documents), where three spaces would represent a large
portion... The suggestion has been raised to make this a percentage of the
line-width, and IIC, we would also need to take into account the font-size.

On the one hand, the TextLM optimizes the search for linebreaks by merging
words into one single element, not 1 element per character. Even with
hyphenation, we only get one box per hyphenated word-fragment). In terms of the
algorithm, there is no difference between a non-interrupted sequence of
fixed-size boxes or a single box spanning the same width. Most elementary
representation: one box per regular character, one glue for a space. Since we
already know that the letter-boxes will be kept together, we only generate the
one box. If hyphenation is enabled, the word-box is later split into multiple
boxes, with additional flagged penalties in between.
On the other hand, spaces generate multiple elements for one single space
character (and sequences of space-characters are currently not glued together
to a single element, IIC).

Looking closer at the Wiki again, I realize that the sequence for a simple
preserved space looks surprisingly similar to that of a simple break in case of
centered text, apart from the stretch/shrink... and in that case, the trailing
glue there is /meant/ to always be pushed to the next line.

> it seems that a possible fix to this undesirable behavior is to move the break
> possibility from the beginning to the end of the boilerplate sequence. 

Could indeed very well be the solution. If so, the auxiliary box may not even
be needed anymore (?)
I'll look into it. At any rate, it seems like the sequence should be
drastically simplified. Specifying white-space-preserve should not mean that
suddenly, it becomes more attractive to break before the space. The break
should still be strongly discouraged. In the most elementary case, if a glue is
preceded by a box, that condition is easily satisfied.
I think the cases where white-space-preserve really plays a part come down to:
1) white space around preserved linefeeds
2) necessary breaks in the middle of a sequence of non-collapsed white-space

For 1), the solution so far has been to end the current paragraph and start a
new one. One TextLM returns a sequence of element-lists to the LineLM.
If a space were simply represented by a glue, it would dissolve higher up. Due
to the added auxiliary box, at least the auxiliary glue is preserved and does
generate the right effect here.

For 2), I'm thinking of very extreme (and highly unusual) cases, where it
becomes necessary to choose 'a' break, but the choice is between white-space
characters only. If white-space treatment is "preserve",  a portion of
white-space should, strictly speaking, be pushed to the next line, and
influence alignment there... but ideally, if it all fits on one line, that
possibility should obviously be preferred above all else.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to