Hi Dario, Dario Laera wrote: > Hi Vincent! > > Except for one concept, I may agree with you: it's all about points of > view, what you are interested in. Read inline comments. > > Il giorno 27/ott/08, alle ore 18:01, Vincent Hennebert ha scritto: >> I’ve finally had the time to look at your patch. IIUC your idea is to do >> best-fit on the N but X lines, and total-fit on the final X lines. > > That's what the patch I wrote actually do, but it's not what I was > intending. I fixed this problem in the pruning code I wrote for trunk by > choosing the first node that have the best leaf as child. This is nor > best fit, neither total fit: it's a step forward than best fit and a > step backward than total fit.
Let me re-phrase your words to be sure I understand you correctly: once all of the layouts for line X have been found, you choose the best of them, select its corresponding ancestor layout for line 1, and discard all the other layouts for line 1, along with their descendants. Is that right? > If X is big enough there's an high probability that the first line > layout won't lead to bad layout for the immediate following lines: > again, this is not best fit. And the last X lines are laid out in total > fit. Do you agree? Hmmm, I think I’m starting to see what you’re trying to achieve. Giving your rationale below helped anyway. Let me re-phrase again: since the best layout for line X is made of this particular first line, there’s a reasonable chance that that first line won’t lead to a bad final layout. That seems to make sense. But since starting from line X you’re doing this pruning every line, I’m not really sure this heuristic can actually be explained like this. The problem if we can’t find a good image to explain a heuristic, is that it becomes difficult to convince ourselves that this heuristic does the job (here, limiting memory consumption while having a better layout quality than best-fit). First-fit and best-fit are easily illustrated: imagine you have to go from city A to city B, making night breaks in hotels, driving between 8 and 10 hours a day. First-fit is: stop at the first hotel you can find that fits within that hours of driving interval. On the next day, same. And so on, until B is reached. Best-fit is: find all of the hotels that fit in the interval; select the one whose corresponding journey lowers petrol consumption (that’s important these days ;-) ). Next day, same, etc. Total-fit is: ok, on the first two days we’ll have to cross that huge mountain chain, it’s gonna be slow, painful, tiring and petrol- consuming. But after that it’s all brand new empty flat motorway until the end. Piece of cake. It would be great if you could find a similar image to illustrate your heuristic. It’s important to ‘feel’ why it’s going to work. >> I’m not sure there’s a real-life use case for such an optimization: >> people wanting speed will still be less happy than with plain best-fit, >> people wanting quality are ready to pay the price for it anyway. > > This is correct, but the total-total fit (document and paragraph) > algorithm the prototype is actually implementing may be very hungry for > resource; in the document you recently wrote you mention a paragraph > that can be laid out in 4, 5 and 6 lines, but what if the document > contains many paragraphs like those I used to test the pruning in trunk? > With pruning (that can easily turned on/off) you may partially keep the > advantages of total total fit. Not if you apply the pruning method at the line level: indeed the first line is unlikely to be the same in, say, the 4-line version of a paragraph, as in its 5-line version. Since you’ll be keeping only one first line, it’s going to be the line belonging to the optimal final layout (say, the 5-line version). You might partially keep the advantages of total-total-fit if you apply pruning only at the page level, but that doesn’t sound right to me: if you apply pruning, that means that you are ready to make a compromise on quality in the interest of performance. But then you don’t need that level of detail brought by the several layout possibilities of a paragraph. Actually I expect that to be enabled only by people wanting quality at any price. >> A thing that might be interesting is to regularly cut down with the >> number of active nodes, every x lines: once all of the layouts for >> line x have been found, select the best active node and discard all the >> other ones. Then do that again for line x + x, 3x, 4x, etc. While >> similar, it has the advantage that every bunch of x lines will have been >> determined by a local total-fit method. In fact the paragraph will be >> made of a sum of local optimums, that may actually correspond to the >> global optimum or not. But even in that case, I’m not sure this is worth >> the additional complexity to a piece of code that’s already well >> complicated enough. >> >> >> Another option might be to set a limit to the amount of consumed memory >> (the number of active nodes, which in turn will have an effect on the >> processing time). Once the maximal number of active nodes is reached, >> start discarding the nodes with highest demerits. But it remains to see >> if such a heuristic proves to be efficient, and what limit to set up. As >> we can see in your other message, figures may change radically from one >> document to the other. > > I think that pruning provide a good trade off output-quality/performance > without messing up so much the code (look at the patch for pruning in > trunk, it's trivial). Anyway pruning isn't the best way to cut down the > number of active layouts, the solutions you propose are more efficient, > but I wrote the pruning for another reason (more later) and here we are > talking about its (positive) side effect. > >> In the end, I would be more inclined to implement a gradation in the >> layout quality (best fit, total fit over page sequence, total fit over >> document, total fit over document + paragraph line numbers, etc.), >> rather than one or several pruning method. I think it should be easier >> to implement yet provide enough flexibility to satisfy all kinds of >> users. >> >> Sorry if that sounds a bit negative. Since this is all but a simple >> topic, I may well have missed the interest of your approach. At any rate >> good ideas sometimes emerge from the oddest experiments, so feel free to >> continue your investigations... > > About my main goal (start imaging violins playing a moving song): I > dream of a fo processor that can process document in "streaming" using a > constant amount of memory regardless the size of the input document > (stop imaging... :P). Probably some nice features needs to be dropped, > or it's simply not possible, but I think that this is an interesting > challenge. Anyway I'm far from reaching this goal. > > Why a streaming memory-efficient processor? Because industry need it: > XML and related are beautiful technologies but sometimes too much > academics and not production oriented. My company need it (I hope they > will allow me to continue working on this...). A prove of the market > interest in this sense is XF Ultrascale, a commercial fo processor that > is claiming very low memory footprint . It managed to obtain > impressing results in some test, but it fails rendering my company test > going out of memory. So it seems that a streaming processor still > doesn't exists, even in commercial products. Hmmm, there are two different points then IMO: - rendering pages as soon as possible; I suddenly see the point of your pruning approach. However, for that case I believe best-fit is the way to go. - limiting memory consumption. This is a different issue IMO. I’d go for setting a maximum number of nodes and regularly discarding the nodes with highest demerits. The rationale being that, at the point where there are so many active nodes, it’s unlikely that the ones with highest demerits will actually lead to the final solution. Unless the content that’s following really is tricky (big unbreakable blocks or whatever). > Obviously, you (and the FOP team) may consider streaming processing not > interesting or a very low priority goal. You probably have targets > different from mine. Let's consider pruning from the two points of view. Well, my hope is to find an approach that makes everyone happy, those wanting speed as well as those wanting quality, with a series of intermediate steps providing different trade-offs. The difference would be in the number of times the tree is pruned, but the pruning itself would always be the same: when the end of an object is reached (block, page-sequence, flow), select the best layout so far and discard the other ones. I believe this method will better integrate into the big picture. Now, the definitive way to compare approaches is to run them on a representative set of documents, and for each one record the processing time, memory consumption and demerits of the final layout. I’d be curious to see the difference between best-fit and the pruning approach then, but I’m not convinced it’s going to be that much. > Do you care about streaming processing? > Pruning is *necessary* (along with mixed line/page breaking) if you want > to render pages before the end of the document is reached while keeping > output quality better than best fit. > > Don't you care about streaming processing? > Pruning is a way of improving performance, surely not the more > efficient. Keeping the amount of active nodes (with pruning or other > solutions) is not necessary and maybe its benefits in the real life use > make it not so desirable. > > I expect you and the FOP team don't care, at least ATM, about streaming > processing. Anyway thank you for your feedback, you spent time for it. So far, our answer to the streaming problem has simply been best-fit. Now, there may be two other solutions: your pruning approach, and my proposal of local total-fits (keeping the best layout every x lines/pages). It remains to see if the quality brought by those latter two methods is higher enough than best-fit to justify the additional processing resources (and code). Vincent