Hi Dario, I’ve finally had the time to look at your patch. IIUC your idea is to do best-fit on the N but X lines, and total-fit on the final X lines. I have to say, I’m a bit doubtful of the interest of this approach in terms of performance and quality, compared to a plain best-fit version (see below):
Dario Laera wrote: > Hi Vincent, > > I wrote a new patch that implements a kind of ?-fit algorithm at > page-level. Let N be the page number for the last page layout obtained > and X a variable representing the depth of the layout tree I want to > keep: each time N gets increased (and its value is greater than X) I > perform a pruning of the layout tree choosing as new root the best > layout for the page with index N-X. Here is an example: the two graphs > below represents the trees of possible page layouts, in the first tree > we have N=4. If X was set to 3, it's time to prune the tree by choosing > the best subtree with a page-1 layout as root. In the example I suppose > that 1a has lower demerits than 1b, so I prune the tree from 1a. > Alternatively I can choose the page-1 layout that has the best leaf as > child (actually not implemented). > > ____0____ > / \ > *1a* 1b > / \ | > / \ | > 2a 2b 2c > / \ / \ / \ > 3a 3b 3c 3d 3e 3f > | > 4a > > 1a > / \ > / \ > 2a 2b > / \ / \ > 3a 3b 3c 3d > | > 4a In this example (X = 3), total-fit is performed until a first layout for line 4 is found. Then the best first line is chosen and the tree is cut accordingly (1b and descendants). When a first layout for line 5 is found, same again: total-fit is performed on lines 2 to 5, giving the tree that has 1a as root, then the best second line is chosen and the rest is cut off (say, 2a and descendants). And so on and so on. So, roughly speaking, given a paragraph of n lines, you’re doing (n - x) “local”, x line wide total fits, along with (n - x) best fits. In the end result, (n - x) lines will have been chosen using the best-fit method (despite the total fits running in parallel), and the final x lines using a total fit using line (n - x - 1) as the root. This sounds like a lot of work for a result that in the end may not look much better than best-fit. Also, the visual result might look strange, with the final lines looking more “even” than the starting ones (but this has to be confirmed on real testcases). I’m not sure there’s a real-life use case for such an optimization: people wanting speed will still be less happy than with plain best-fit, people wanting quality are ready to pay the price for it anyway. A thing that might be interesting is to regularly cut down with the number of active nodes, every x lines: once all of the layouts for line x have been found, select the best active node and discard all the other ones. Then do that again for line x + x, 3x, 4x, etc. While similar, it has the advantage that every bunch of x lines will have been determined by a local total-fit method. In fact the paragraph will be made of a sum of local optimums, that may actually correspond to the global optimum or not. But even in that case, I’m not sure this is worth the additional complexity to a piece of code that’s already well complicated enough. Another option might be to set a limit to the amount of consumed memory (the number of active nodes, which in turn will have an effect on the processing time). Once the maximal number of active nodes is reached, start discarding the nodes with highest demerits. But it remains to see if such a heuristic proves to be efficient, and what limit to set up. As we can see in your other message, figures may change radically from one document to the other. In the end, I would be more inclined to implement a gradation in the layout quality (best fit, total fit over page sequence, total fit over document, total fit over document + paragraph line numbers, etc.), rather than one or several pruning method. I think it should be easier to implement yet provide enough flexibility to satisfy all kinds of users. Sorry if that sounds a bit negative. Since this is all but a simple topic, I may well have missed the interest of your approach. At any rate good ideas sometimes emerge from the oddest experiments, so feel free to continue your investigations... > The X value determine the trade off between good aesthetic results and > time/memory performance: an high value should obtain results similar to > total-fit, a low value, say 1, should obtain the fastest performance. > > This approach would allow to finalize (create the areas, render) the > page that has been chosen as new root and, if possible, to free memory > that is no more necessary (also FO-tree object and relative LM). This > possibility apart, it seems to me that in general the new prototype can > benefit from this approach. > > The implementation adds a low overhead in terms of memory and > computation that is largely countervailed by the effect of pruning. Each > LM has a new pointer (prevPage, pointing to the previous page layout in > the path to the root) and the pageLM has three int for determining the > time of pruning (N-X). The PageLM updates this integers each time a > completedPart() is called. After calling the completedPart a LM ask the > PageLM if it's time to prune; if so it performs the following operation > on each active layout (AL): > > * extract the N-X page by walking down the prevPage links (only X step, > not all layouts in this branch); > * if it is the best page layout found until now, create a new > ActiveLayouts object (eventually replacing the old one) and add to it > the AL; > * if it is the same best page layout, simply add the AL to the > ActiveLayouts object previously created; > * otherwise do nothing, this AL will be discarded. > > Finally the new ActiveLayouts will replace the previous layout tree. > > > Comments are very welcome. > Dario Vincent