I implemented your suggestion in revision 1052214. Thanks. In order to omit keeping any reference, you might implement a command line option. Alternatively, you might implement scanning the fo tree to check if any page references are used. However, this would only work if there is only one page sequence in the fo file. There is no way to know that during processing, since FOP processes one page sequence at a time, without looking forward in the fo file. Other than LaTeX, FOP implements a one-time process including page references. The price seems to be the use of memory to keep the necessary data if any reference would occur.
Simon On Wed, Dec 22, 2010 at 06:59:08PM +0200, Alexios Giotis wrote: > Hi fop-dev, > > In one of my use cases, I create a PDF file having about 20000 pages from FOP > intermediate format. I imagined this as a streaming process (e.g. read a page > in FOP_IF, write it to PDF and release memory) with the exception of caching > of images. In reality, by analyzing a heap dump taken with the > -XX:+HeapDumpOnOutOfMemoryError parameter on a production server, I found out > that o.a.f.r.p.PDFDocumentHandler keeps for every page a reference to be used > for bookmarks & outlines. In my case, the retained heap size of every page > is about 150kb. If you multiply this with the number of pages, the memory > usage is large. Even worse, on my production server I have 10 threads > creating 20k page documents in parallel. > > Attached is a patch against the latest revision 1051938 of trunk that > considerably reduces the memory usage by keeping only a String pdfPageRef > instead of the full org.apache.fop.pdf.PDFReference object. This was possible > because from the object we only need to get that string. Ideally, I would > like not to keep at all the page references if bookmarks & outlines are not > used. Or at least, keep it only for the pages that are indeed referenced. Is > this possible ? If so, do you have any hints for this ? > > If further optimizations are not possible or complex, then I guess I will > just open an issue and attach this patch. I hope you agree with the addition > of generics on the Map declaration and with the change of "new Integer()" to > "Integer.valueOf())" (findbugs performance warning).