Merry Christmas to all !

Simon, thanks for applying my patch and for your hints. The memory usage of 
that map is now a few MB. Initially it was around 2GB, so it's not worth to 
make further and more complex changes.

Alexis


On Dec 23, 2010, at 12:34 PM, Simon Pepping wrote:

> I implemented your suggestion in revision 1052214. Thanks.
> 
> In order to omit keeping any reference, you might implement a command
> line option. Alternatively, you might implement scanning the fo tree
> to check if any page references are used. However, this would only
> work if there is only one page sequence in the fo file. There is no
> way to know that during processing, since FOP processes one page
> sequence at a time, without looking forward in the fo file. Other than
> LaTeX, FOP implements a one-time process including page references.
> The price seems to be the use of memory to keep the necessary data if
> any reference would occur.
> 
> Simon
> 
> On Wed, Dec 22, 2010 at 06:59:08PM +0200, Alexios Giotis wrote:
>> Hi fop-dev,
>> 
>> In one of my use cases, I create a PDF file having about 20000 pages from 
>> FOP intermediate format. I imagined this as a streaming process (e.g. read a 
>> page in FOP_IF, write it to PDF and release memory) with the exception of 
>> caching of images. In reality, by analyzing a heap dump taken with the 
>> -XX:+HeapDumpOnOutOfMemoryError parameter on a production server, I found 
>> out that o.a.f.r.p.PDFDocumentHandler keeps for every page a reference to be 
>> used for bookmarks & outlines. In my case, the retained  heap size of every 
>> page is about 150kb. If you multiply this with the number of pages, the 
>> memory usage is large. Even worse, on my production server I have 10 threads 
>> creating 20k page documents in parallel.
>> 
>> Attached is a patch against the latest revision 1051938 of trunk that 
>> considerably reduces the memory usage by keeping only a String pdfPageRef 
>> instead of the full org.apache.fop.pdf.PDFReference object. This was 
>> possible because from the object we only need to get that string.  Ideally, 
>> I would like not to keep at all the page references if bookmarks & outlines 
>> are not used. Or at least, keep it only for the pages that are indeed 
>> referenced. Is this possible ? If so, do you have any hints for this ?
>> 
>> If further optimizations are not possible or complex, then I guess I will 
>> just open an issue and attach this patch. I hope you agree with the addition 
>> of generics on the Map declaration and with the change of "new Integer()" to 
>> "Integer.valueOf())" (findbugs performance warning).

Reply via email to