Re: Layouts tree pruning for the prototype

Dario Laera Mon, 03 Nov 2008 07:59:21 -0800


Il giorno 30/ott/08, alle ore 12:45, Vincent Hennebert ha scritto:

Let me re-phrase your words to be sure I understand you correctly:once
all of the layouts for line X have been found, you choose the best of
ethem, select its corresponding ancestor layout for line 1, anddiscardall the other layouts for line 1, along with their descendants. Isthat
right?


Yes, it's right.

It would be great if you could find a similar image to illustrate your
heuristic. It’s important to ‘feel’ why it’s going to work.

Yes, it would be very nice, unfortunately ATM I don't have such animage.

I’m not sure there’s a real-life use case for such an optimization:
people wanting speed will still be less happy than with plain best-fit,
people wanting quality are ready to pay the price for it anyway.
This is correct, but the total-total fit (document and paragraph)
algorithm the prototype is actually implementing may be very hungryfor
resource; in the document you recently wrote you mention a paragraph
that can be laid out in 4, 5 and 6 lines, but what if the document
contains many paragraphs like those I used to test the pruning intrunk?With pruning (that can easily turned on/off) you may partially keepthe
advantages of total total fit.
Not if you apply the pruning method at the line level: indeed thefirst
line is unlikely to be the same in, say, the 4-line version of
a paragraph, as in its 5-line version. Since you’ll be keeping onlyone
first line, it’s going to be the line belonging to the optimal final
layout (say, the 5-line version).
You might partially keep the advantages of total-total-fit if youapply
pruning only at the page level, but that doesn’t sound right to me: if
you apply pruning, that means that you are ready to make acompromise on
quality in the interest of performance. But then you don’t need that
level of detail brought by the several layout possibilities of
a paragraph. Actually I expect that to be enabled only by peoplewanting
quality at any price.
A thing that might be interesting is to regularly cut down with the
number of active nodes, every x lines: once all of the layouts for
line x have been found, select the best active node and discardall the
other ones. Then do that again for line x + x, 3x, 4x, etc. While
similar, it has the advantage that every bunch of x lines willhave beendetermined by a local total-fit method. In fact the paragraph willbe
made of a sum of local optimums, that may actually correspond to the
global optimum or not. But even in that case, I’m not sure this isworth
the additional complexity to a piece of code that’s already well
complicated enough.
Another option might be to set a limit to the amount of consumedmemory(the number of active nodes, which in turn will have an effect ontheprocessing time). Once the maximal number of active nodes isreached,start discarding the nodes with highest demerits. But it remainsto seeif such a heuristic proves to be efficient, and what limit to setup. Aswe can see in your other message, figures may change radicallyfrom one
document to the other.
I think that pruning provide a good trade off output-quality/performance
without messing up so much the code (look at the patch for pruning in
trunk, it's trivial). Anyway pruning isn't the best way to cut downthenumber of active layouts, the solutions you propose are moreefficient,but I wrote the pruning for another reason (more later) and here weare
talking about its (positive) side effect.
In the end, I would be more inclined to implement a gradation in the
layout quality (best fit, total fit over page sequence, total fitover
document, total fit over document + paragraph line numbers, etc.),
rather than one or several pruning method. I think it should beeasier
to implement yet provide enough flexibility to satisfy all kinds of
users.

Sorry if that sounds a bit negative. Since this is all but a simple
topic, I may well have missed the interest of your approach. Atany rategood ideas sometimes emerge from the oddest experiments, so feelfree to
continue your investigations...
About my main goal (start imaging violins playing a moving song): I
dream of a fo processor that can process document in "streaming"using a
constant amount of memory regardless the size of the input document
(stop imaging... :P). Probably some nice features needs to bedropped,
or it's simply not possible, but I think that this is an interesting
challenge. Anyway I'm far from reaching this goal.

Why a streaming memory-efficient processor? Because industry need it:
XML and related are beautiful technologies but sometimes too much
academics and not production oriented. My company need it (I hopethey
will allow me to continue working on this...). A prove of the market
interest in this sense is XF Ultrascale, a commercial fo processorthat
is claiming very low memory footprint [1]. It managed to obtain
impressing results in some test, but it fails rendering my companytest
going out of memory. So it seems that a streaming processor still
doesn't exists, even in commercial products.
Hmmm, there are two different points then IMO:
- rendering pages as soon as possible; I suddenly see the point ofyour
pruning approach. However, for that case I believe best-fit is the way
to go.
- limiting memory consumption. This is a different issue IMO. I’d gofor
setting a maximum number of nodes and regularly discarding the nodes
with highest demerits. The rationale being that, at the point where
there are so many active nodes, it’s unlikely that the ones with
highest demerits will actually lead to the final solution. Unless the
content that’s following really is tricky (big unbreakable blocks or
whatever).

It's not only a question of active nodes for memory consumption: howbig is the in memory representation of the FO tree of a 500MBdocument? Three, four, five times the original size? And what aboutall its LMs? Rendering pages ASAP (or at least building the area tree)is necessary if you want minimize this footprint by freeing no morenecessary information. In that case the active node reduction is theleast of the benefit. The idea of applying pruning was born with thisin mind, and that's why I think pruning is good.

Well, my hope is to find an approach that makes everyone happy, those
wanting speed as well as those wanting quality, with a series of

intermediate steps providing different trade-offs. The differencewould

be in the number of times the tree is pruned, but the pruning itself
would always be the same: when the end of an object is reached (block,
page-sequence, flow), select the best layout so far and discard the
other ones. I believe this method will better integrate into the big
picture.

Now, the definitive way to compare approaches is to run them on
a representative set of documents, and for each one record the
processing time, memory consumption and demerits of the final layout.
I’d be curious to see the difference between best-fit and the pruning
approach then, but I’m not convinced it’s going to be that much.

Do you care about streaming processing?
Pruning is *necessary* (along with mixed line/page breaking) if youwantto render pages before the end of the document is reached whilekeeping
output quality better than best fit.

Don't you care about streaming processing?
Pruning is a way of improving performance, surely not the more
efficient. Keeping the amount of active nodes (with pruning or other
solutions) is not necessary and maybe its benefits in the real lifeuse
make it not so desirable.
I expect you and the FOP team don't care, at least ATM, aboutstreamingprocessing. Anyway thank you for your feedback, you spent time forit.


So far, our answer to the streaming problem has simply been best-fit.
Now, there may be two other solutions: your pruning approach, and my
proposal of local total-fits (keeping the best layout every

x lines/pages). It remains to see if the quality brought by thoselatter

two methods is higher enough than best-fit to justify the additional
processing resources (and code).

Best fit obviously would work, but I have the *feeling* that pruningor local total fit would achieve better results. I need to prove it,the demerits comparison you mentioned above would be very interestingin this sense. I also have the feeling that pruning should work betterthan local total fit in some cases: think if you do local total fit inX lines in a paragraph that is X + 2 lines long: these last two linesmay be forced to high demerits. But, again, tests are needed to provethe feelings.


Dario

Re: Layouts tree pruning for the prototype

Reply via email to