Simon Pepping wrote:
> On Fri, Jan 18, 2008 at 12:09:14AM +0100, Andreas L Delmelle wrote:
>> I have always somehow assumed there to be a threshold in the most common 
>> cases. In the sense that certain (maybe in fact even the bulk of) feasible 
>> breaks can already be excluded quite early, even if the sequence consists 
>> of millions of elements. Simply because they would, in intermittent runs, 
>> repeatedly be considered to be 'bad' breaks, or because they become 'worse' 
>> than the previous run.
>> Something like this happens already, in a way, I think, albeit very deeply 
>> embedded inside the algorithm, and a bit late too...
>> Once we start determining actual breaks, the further we progress in the 
>> page-sequence, the more layout-possibilities that need to be excluded. We 
>> simply cannot consider all of them, including the ones that would lead to 
>> unreasonable results.
> 
> Study the book, Andreas. Donald Knuth has treated all this, in a
> manner that is quite complete. As Vincent said, this is the principle
> of dynamic programming.
> 
> Earlier feasible pagebreaks are kept 'alive' because later feasible
> pagebreaks keep a chain of references to them. Early feasible
> pagebreaks that are not part of a 'best' chain of pagebreaks up to one
> of the active nodes, are not referenced anymore and therefore
> disappear from memory.
> 
> Simon
> 

Not so much disappear from memory, as be removed from consideration, I
think.

This was straight-forwardly true of TeX-1, let's call it. Each decided
break referred back to only one predecessor. But, as the article notes,
this approach was not adequate in practice. The algorithm was revisited,
and TeX-2 resulted. The difference is in tracking so-called "fitness
classes" and break-point/line-number associations. This results in
break-point/line-number/fitness-class triples. There may be multiple
line numbers, and up to four fitness classes. Only one of these paths is
optimal at the point where the break is chosen; however, penalties
associated with variations in packing density between contiguous lines
can result in another path being selected when the subsequent break is
selected.

Similarly, the requirement to shrink or expand by a line at the end of
layout can result in a sub-optimal layout.

In the first version, the accumulated demerit value of a line between
two breakpoints, y and z, represented the complete analysis of the
breaks to z. Everything before y was lost, the theory being that it was
not longer needed. In the second version, the problem of contrasting
line packing densities was addressed by carrying forward information
about the relationship between y and a set of nodes, (x1, x2, x3, x4)
from which y was reachable. These represented the best path to y for
each of the set of line packing densities. This set was used to
calculate the minimum demerit path to the line y-z, say x2-y-z. Nodes
(x1, x3, x4) could then be discarded. For z, a set of possible
antecedents (y, y2, y3, y4) would be constructed, etc.

TeX-2 was effectively able to look back a line by holding node sets for
longer.

Compare this with the management of line numbers. For each optimal
breakpoint discovered, y, there is also an optimal line number, n. TeX-2
tracks also any breakpoints which lead to y, such that the number of the
line ending at y is other than n. For each such line number, it keeps
the optimal preceding breakpoint. If the optimal number of lines in the
paragraph is n, and it is decided to layout n-1 lines, there is, AFAICT,
no necessary relationship between the the given layout and the optimal
n-1 line layout of the paragraph. It is _a_ solution that does not
involve back-tracking.

An interesting consideration is hyphen ladders longer than two. Two
consecutive hyphens are easy to deal with, but what if you want to
impose a higher order of penalty for three? You could do something
similar to the fitness classes. If an optimal break involved consecutive
hyphens, you could keep also the next best non-hyphen solution. Then, if
 considering a third hyphen, and the presence of a non-hyphenated option
indicated that the break being considered would lead to a triple, you
could apply the extra penalty, and perhaps choose the non-hyphenated
predecessor.

So, at the point where you are considering the possible break
hyphen(alt-non-hyphen);hyphen;hyphen, you can decide on
alt-non-hyphen;hyphen;hyphen. What, in general, you cannot choose is
hyphen;other-non-hyphen;hyphen, because the hyphen;other-non-hyphen
break option has already been rejected.

Peter

-- 
Peter B. West <http://cv.pbw.id.au/>
Folio <http://defoe.sourceforge.net/folio/>

Reply via email to