Hi,

Dario Laera wrote:
> Hi Simon,
> 
> thanks for your reply.
> 
> Il giorno 23/ott/08, alle ore 21:43, Simon Pepping ha scritto:
> 
>> Hi Dario,
>>
>> This is an interesting study. I need some more time to understand the
>> implications fully. At first sight you prove that for normal documents
>> the improvement is small. The paragraphs need to get long before your
>> strategy makes a difference. This is interesting, however, for long
>> chapters with many pages, as you mentioned in your earlier email.
> 
> ATM I prefer to talk about paragraphs only: in the test I've done today
> I saw that for page breaking there is always just one active node. So
> it's clear why formatting the xsl-fo recommendation, that is over 400
> pages long but with short para, doesn't get faster. I need to
> investigate in this area.
> 
>> It is clear why long paragraphs make a difference. Why does one- or
>> two-column layout make a large difference? Simply due to the twice
>> larger number of pages? I do not understand the left-aligned case. Is
>> this not just the same as a first-fit layout?
> 
> Nice questions... I'm trying to understand this behavior too, the first
> time I've implemented the pruning on prototype was for another reason
> and I accidentally noticed the performance boost :)
> About one or two columns, or better, long or short lines: again, I don't
> know why, maybe it's just because the double number of breaks; I thing I
> noted is that for the same number of active node with shorter lines the
> gap between startLine and endLine is wider than with long lines. I don't
> know if this is meaningful.

In left-aligned mode this can be explained by looking at [1]: a line can
be cut if its natural length is between (line-width - 3 sp-width) and
line-width. So, at any break point in the paragraph, where the total
width is w, you can typeset the paragraph on a number of lines between
w / line-width (every line full) and w / (line-width - 3 sp-width)
(every line “empty”). This interval increases when the line width
decreases.

Example: sp-width = 0.1 cm, line-width = 17 cm, w = 1700 cm:
    between 100 and 101 lines
with line-width = 8 cm (double column):
    between 213 and 220 lines

Obviously the interval increases with the number of lines, but this
becomes unrealistic.

[1] http://wiki.apache.org/xmlgraphics-fop/LineBreaking

> About left-aligned or justified: with the latter *sometimes* having
> threshold=1.0 is enough (I think because of stretchable glues) so
> obviously the number of active node is reduced, while the former will
> always fall in threshold=20.0 and in force mode (talking about my
> tests). Anyway, while I'm not sure short/long lines really makes
> difference, it's evident that non justified text produce a lot more of
> active nodes than justified ones.

In justified mode, every line must fit the line width with quite a low
tolerance. In left-aligned mode the algorithm can be much more liberal,
and a rather big amount of blank space is allowed at the end of the line
in comparison to justify. Hence the higher number of active nodes for
ragged lines.

In justified mode, longer lines should lead to more break opportunities,
so more active nodes. Indeed, on every line there will be more spaces to
play with, giving the content more chance to have enough of
shrinkability or stretchability to fit on the line.
In left-aligned mode this is the other way around: spaces are not
stretchable so having more of them doesn’t help. However, at every break
opportunity a big stretchable space is added, that is likely to make the
content fit. The shorter the line, the more often that space is added,
the more break opportunities it results into.

<snip/>

[from another post]
> A really strange thing: is it theoretically possible that the amount 
> of active nodes is greater than the number of knuth elements? As you 
> can see in the graph, for short lines there are >14000 active nodes 
> but only ~7000 knuth elements...

Line breaking is a bin-packing problem [2], so the potential number of
active nodes is exponential to the number of words you have in your
paragraph. So, yes, it’s normal to have more active nodes than elements
in the sequence.

[2] http://en.wikipedia.org/wiki/Bin_packing_problem


Vincent

Reply via email to