How do I remove myself from this list? 


From: Vincent Hennebert <[EMAIL PROTECTED]>
Sent: Monday, October 27, 2008 1:01:43 PM
Subject: Re: Layouts tree pruning for the prototype

Hi Dario,

I’ve finally had the time to look at your patch. IIUC your idea is to do
best-fit on the N but X lines, and total-fit on the final X lines.
I have to say, I’m a bit doubtful of the interest of this approach in
terms of performance and quality, compared to a plain best-fit version
(see below):

Dario Laera wrote:
> Hi Vincent,
> I wrote a new patch that implements a kind of ?-fit algorithm at
> page-level. Let N be the page number for the last page layout obtained
> and X a variable representing the depth of the layout tree I want to
> keep: each time N gets increased (and its value is greater than X) I
> perform a pruning of the layout tree choosing as new root the best
> layout for the page with index N-X. Here is an example: the two graphs
> below represents the trees of possible page layouts, in the first tree
> we have N=4. If X was set to 3, it's time to prune the tree by choosing
> the best subtree with a page-1 layout as root. In the example I suppose
> that 1a has lower demerits than 1b, so I prune the tree from 1a.
> Alternatively I can choose the page-1 layout that has the best leaf as
> child (actually not implemented).
>       ____0____
>      /         \
>    *1a*         1b
>    /  \         |
>   /    \        |
>  2a     2b      2c
> /  \   /  \    /  \
> 3a  3b 3c  3d  3e  3f
> |
> 4a
>     1a
>    /  \
>   /    \
>  2a     2b
> /  \   /  \
> 3a  3b 3c  3d
> |
> 4a

In this example (X = 3), total-fit is performed until a first layout for
line 4 is found. Then the best first line is chosen and the tree is cut
accordingly (1b and descendants). When a first layout for line 5 is
found, same again: total-fit is performed on lines 2 to 5, giving the
tree that has 1a as root, then the best second line is chosen and the
rest is cut off (say, 2a and descendants). And so on and so on.

So, roughly speaking, given a paragraph of n lines, you’re doing (n - x)
“local”, x line wide total fits, along with (n - x) best fits. In the
end result, (n - x) lines will have been chosen using the best-fit
method (despite the total fits running in parallel), and the final
x lines using a total fit using line (n - x - 1) as the root.

This sounds like a lot of work for a result that in the end may not look
much better than best-fit. Also, the visual result might look strange,
with the final lines looking more “even” than the starting ones (but
this has to be confirmed on real testcases).

I’m not sure there’s a real-life use case for such an optimization:
people wanting speed will still be less happy than with plain best-fit,
people wanting quality are ready to pay the price for it anyway.

A thing that might be interesting is to regularly cut down with the
number of active nodes, every x lines: once all of the layouts for
line x have been found, select the best active node and discard all the
other ones. Then do that again for line x + x, 3x, 4x, etc. While
similar, it has the advantage that every bunch of x lines will have been
determined by a local total-fit method. In fact the paragraph will be
made of a sum of local optimums, that may actually correspond to the
global optimum or not. But even in that case, I’m not sure this is worth
the additional complexity to a piece of code that’s already well
complicated enough.

Another option might be to set a limit to the amount of consumed memory
(the number of active nodes, which in turn will have an effect on the
processing time). Once the maximal number of active nodes is reached,
start discarding the nodes with highest demerits. But it remains to see
if such a heuristic proves to be efficient, and what limit to set up. As
we can see in your other message, figures may change radically from one
document to the other.

In the end, I would be more inclined to implement a gradation in the
layout quality (best fit, total fit over page sequence, total fit over
document, total fit over document + paragraph line numbers, etc.),
rather than one or several pruning method. I think it should be easier
to implement yet provide enough flexibility to satisfy all kinds of

Sorry if that sounds a bit negative. Since this is all but a simple
topic, I may well have missed the interest of your approach. At any rate
good ideas sometimes emerge from the oddest experiments, so feel free to
continue your investigations...

> The X value determine the trade off between good aesthetic results and
> time/memory performance: an high value should obtain results similar to
> total-fit, a low value, say 1, should obtain the fastest performance.
> This approach would allow to finalize (create the areas, render) the
> page that has been chosen as new root and, if possible, to free memory
> that is no more necessary (also FO-tree object and relative LM). This
> possibility apart, it seems to me that in general the new prototype can
> benefit from this approach.
> The implementation adds a low overhead in terms of memory and
> computation that is largely countervailed by the effect of pruning. Each
> LM has a new pointer (prevPage, pointing to the previous page layout in
> the path to the root) and the pageLM has three int for determining the
> time of pruning (N-X). The PageLM updates this integers each time a
> completedPart() is called. After calling the completedPart a LM ask the
> PageLM if it's time to prune; if so it performs the following operation
> on each active layout (AL):
>  * extract the N-X page by walking down the prevPage links (only X step,
> not all layouts in this branch);
>  * if it is the best page layout found until now, create a new
> ActiveLayouts object (eventually replacing the old one) and add to it
> the AL;
>  * if it is the same best page layout, simply add the AL to the
> ActiveLayouts object previously created;
>  * otherwise do nothing, this AL will be discarded.
> Finally the new ActiveLayouts will replace the previous layout tree.
> Comments are very welcome.
> Dario


Reply via email to