Re: FOP and large documents (again)

Vincent Hennebert Tue, 23 Aug 2011 10:21:30 -0700

Hi Stephan,

[Moving to fop-dev as we are diving in the gory details.]
[... And sorry for the delay.]


Some time ago I worked on a prototype implementation of a new layout
engine that would create layout elements on the fly. The main goal was
to address the changing IPD problem, which as you noticed is not
properly handled by the current code. The idea was also to be able to
easily switch between a best-fit and a total-fit approach for page
breaking, by simply tweaking a few parameters.

In that exercise, tables proved to be the most challenging part. The
current table building algorithm relies on how much of the table would
end up after a page break, and sets the lengths of Knuth elements
accordingly. To do that it needs to assume that page have a certain,
fixed width. This algorithm will probably have to be re-designed in
order to implement any kind of best-fit approach.

Tables are in my view /the/ key part of the layout engine and, if you
can get tables working then all the rest would follow.

I wrote a document illustrating the challenges I faced, that you might
find interesting:
http://people.apache.org/~vhennebert/prototype/
The code for the prototype is in the Temp_Interleaved_Page_Line_Breaking
branch:
http://svn.apache.org/viewvc/xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/
(You can ignore the early Ruby version.)

It’s horribly flawed and outdated but might give you some ideas for your
implementation. I have an improved version on my hard drive but haven’t
had a chance to clean it up and publish it yet.

Good luck,
Vincent


On 16/08/11 03:23, Stephan Thesing wrote:
> Hello,
> 
> indeed, as the code currently is, it will be hard to make this a first
> fit layout algorithm for pages.
> 
> As the generation of KnuthElements (or ListElements) by the LayoutManagers
> seems to be quite interwoven with tasks like page alignment and other stuff, 
> I would rather not want to adapt this to a "demand driven" generation of 
> Elements as needed by a first fit approach.
> Also, the possible IPD changes between pages poses a problem (it is also a 
> problem for the current code, which is "not nice" for that case).
> 
> I would rather change the layout managers to produce KnuthElements in the way 
> TeX does and leave page alignment to the page collection stage.
> I don't see another manageable way to do this but to add to the layout 
> managers a new interface for this demand driven approach.
> Essentially, this would result in a parallel implementation of generating 
> content to the getNextKnuthElements() and addAreas() interface.
> 
> I can spend some effort and since I clearly have a need for a scalable first 
> fit page layout, I will give this a try.
> 
> Best regards
>    Stephan
> 
> PS: Is there any more in-depth documentation about the way the
> layout managers work apart from the Wiki Pages?
> 
> 
> -------- Original-Nachricht --------
>> Datum: Wed, 3 Aug 2011 10:55:35 +0200
>> Von: Simon Pepping <[email protected]>
>> An: [email protected]
>> Betreff: Re: FOP and large documents (again)
> 
>> On Wed, Aug 03, 2011 at 10:23:48AM +0200, Stephan Thesing wrote:
>>> Looking at the code (as far as I understand it), for each page-sequence
>>> all KnuthElements are computed first by the layout managers.
>>> This is split only for forced page breaks.
>>> Then on the whole sequence, possible page break positions are searched
>> for.
>>>
>>> Only after this are the actual output areas computed and pages produced.
>>>
>>> Clearly, this doesn't scale for large page-sequences...
>>>
>>> Is there a reason why this approach was chosen, instead of "lazily" (or
>> on-demand)computing KnuthElements, putting them on the page and as soon as
>> it is filled, pass it to the renderer?
>>
>> Both line and page breaking use the Knuth algorithm of a total fit.
>> The algorithm requires the complete content before it can be applied.
>> Clearly TeX does not do this; for page breaking it uses a best fit
>> approach.
>>
>> For FOP it would be better if it could apply either strategy, at the
>> demand of the user. But FOP is coded such that it first collects all
>> content, in the process doing all line breaking in paragraphs, before
>> it starts its page breaking algorithm. Therefore a best fit page
>> breaking algorithm does not solve the memory problem. Changing this so
>> that page breaking (best or total fit at the user's choice) is
>> considered while collecting content has proven too hard (or too
>> time-consuming) until now. See e.g.
>> http://svn.apache.org/viewvc/xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/.
>>
>> There is a best fit page breaking algorithm, which is mainly used for
>> cases with varying page widths. But it is a hack in the sense that it
>> throws away all collected content beyond the current page, and
>> restarts the process.
>>
>> So, help needed.
>>
>> Simon
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

Re: FOP and large documents (again)

Reply via email to