Merry Christmas to all ! Simon, thanks for applying my patch and for your hints. The memory usage of that map is now a few MB. Initially it was around 2GB, so it's not worth to make further and more complex changes.
Alexis On Dec 23, 2010, at 12:34 PM, Simon Pepping wrote: > I implemented your suggestion in revision 1052214. Thanks. > > In order to omit keeping any reference, you might implement a command > line option. Alternatively, you might implement scanning the fo tree > to check if any page references are used. However, this would only > work if there is only one page sequence in the fo file. There is no > way to know that during processing, since FOP processes one page > sequence at a time, without looking forward in the fo file. Other than > LaTeX, FOP implements a one-time process including page references. > The price seems to be the use of memory to keep the necessary data if > any reference would occur. > > Simon > > On Wed, Dec 22, 2010 at 06:59:08PM +0200, Alexios Giotis wrote: >> Hi fop-dev, >> >> In one of my use cases, I create a PDF file having about 20000 pages from >> FOP intermediate format. I imagined this as a streaming process (e.g. read a >> page in FOP_IF, write it to PDF and release memory) with the exception of >> caching of images. In reality, by analyzing a heap dump taken with the >> -XX:+HeapDumpOnOutOfMemoryError parameter on a production server, I found >> out that o.a.f.r.p.PDFDocumentHandler keeps for every page a reference to be >> used for bookmarks & outlines. In my case, the retained heap size of every >> page is about 150kb. If you multiply this with the number of pages, the >> memory usage is large. Even worse, on my production server I have 10 threads >> creating 20k page documents in parallel. >> >> Attached is a patch against the latest revision 1051938 of trunk that >> considerably reduces the memory usage by keeping only a String pdfPageRef >> instead of the full org.apache.fop.pdf.PDFReference object. This was >> possible because from the object we only need to get that string. Ideally, >> I would like not to keep at all the page references if bookmarks & outlines >> are not used. Or at least, keep it only for the pages that are indeed >> referenced. Is this possible ? If so, do you have any hints for this ? >> >> If further optimizations are not possible or complex, then I guess I will >> just open an issue and attach this patch. I hope you agree with the addition >> of generics on the Map declaration and with the change of "new Integer()" to >> "Integer.valueOf())" (findbugs performance warning).