Simon Pepping wrote: > On Fri, Jan 18, 2008 at 12:09:14AM +0100, Andreas L Delmelle wrote: >> I have always somehow assumed there to be a threshold in the most common >> cases. In the sense that certain (maybe in fact even the bulk of) feasible >> breaks can already be excluded quite early, even if the sequence consists >> of millions of elements. Simply because they would, in intermittent runs, >> repeatedly be considered to be 'bad' breaks, or because they become 'worse' >> than the previous run. >> Something like this happens already, in a way, I think, albeit very deeply >> embedded inside the algorithm, and a bit late too... >> Once we start determining actual breaks, the further we progress in the >> page-sequence, the more layout-possibilities that need to be excluded. We >> simply cannot consider all of them, including the ones that would lead to >> unreasonable results. > > Study the book, Andreas. Donald Knuth has treated all this, in a > manner that is quite complete. As Vincent said, this is the principle > of dynamic programming. > > Earlier feasible pagebreaks are kept 'alive' because later feasible > pagebreaks keep a chain of references to them. Early feasible > pagebreaks that are not part of a 'best' chain of pagebreaks up to one > of the active nodes, are not referenced anymore and therefore > disappear from memory. > > Simon >
Not so much disappear from memory, as be removed from consideration, I think. This was straight-forwardly true of TeX-1, let's call it. Each decided break referred back to only one predecessor. But, as the article notes, this approach was not adequate in practice. The algorithm was revisited, and TeX-2 resulted. The difference is in tracking so-called "fitness classes" and break-point/line-number associations. This results in break-point/line-number/fitness-class triples. There may be multiple line numbers, and up to four fitness classes. Only one of these paths is optimal at the point where the break is chosen; however, penalties associated with variations in packing density between contiguous lines can result in another path being selected when the subsequent break is selected. Similarly, the requirement to shrink or expand by a line at the end of layout can result in a sub-optimal layout. In the first version, the accumulated demerit value of a line between two breakpoints, y and z, represented the complete analysis of the breaks to z. Everything before y was lost, the theory being that it was not longer needed. In the second version, the problem of contrasting line packing densities was addressed by carrying forward information about the relationship between y and a set of nodes, (x1, x2, x3, x4) from which y was reachable. These represented the best path to y for each of the set of line packing densities. This set was used to calculate the minimum demerit path to the line y-z, say x2-y-z. Nodes (x1, x3, x4) could then be discarded. For z, a set of possible antecedents (y, y2, y3, y4) would be constructed, etc. TeX-2 was effectively able to look back a line by holding node sets for longer. Compare this with the management of line numbers. For each optimal breakpoint discovered, y, there is also an optimal line number, n. TeX-2 tracks also any breakpoints which lead to y, such that the number of the line ending at y is other than n. For each such line number, it keeps the optimal preceding breakpoint. If the optimal number of lines in the paragraph is n, and it is decided to layout n-1 lines, there is, AFAICT, no necessary relationship between the the given layout and the optimal n-1 line layout of the paragraph. It is _a_ solution that does not involve back-tracking. An interesting consideration is hyphen ladders longer than two. Two consecutive hyphens are easy to deal with, but what if you want to impose a higher order of penalty for three? You could do something similar to the fitness classes. If an optimal break involved consecutive hyphens, you could keep also the next best non-hyphen solution. Then, if considering a third hyphen, and the presence of a non-hyphenated option indicated that the break being considered would lead to a triple, you could apply the extra penalty, and perhaps choose the non-hyphenated predecessor. So, at the point where you are considering the possible break hyphen(alt-non-hyphen);hyphen;hyphen, you can decide on alt-non-hyphen;hyphen;hyphen. What, in general, you cannot choose is hyphen;other-non-hyphen;hyphen, because the hyphen;other-non-hyphen break option has already been rejected. Peter -- Peter B. West <http://cv.pbw.id.au/> Folio <http://defoe.sourceforge.net/folio/>