Piyush Khandelwal commented on FOP-2937:

Apart from the change that I suggested in the description; I also changed 
entries HashMap in PDFDictionary class that is extended by PDFPage, 
PDFPageLabels to WeakHashMap - though this change in itself didn't bring much 
benefit but in combined to above some improvements was seen because of this 

Adding patch for that as well.

> [PATCH]Post PDF generation, Soft reference of PDFObject in PDFReference are 
> not immediately garbage collected leading to excessive memory usage.
> ------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: FOP-2937
>                 URL: https://issues.apache.org/jira/browse/FOP-2937
>             Project: FOP
>          Issue Type: Improvement
>    Affects Versions: 2.3, 2.4
>            Reporter: Piyush Khandelwal
>            Priority: Major
>         Attachments: pdfreference.patch
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, 
> PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand 
> pages with 12 GB of RAM;* lots of these references linger around waiting for 
> the garbage collector to collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory 
> without throwing out of memory.
> Here are few metadata from my testing for further understanding of the issue 
> - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 
> 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName 
> etc.
> *Question - * Is there any specific reason for using SoftReference in 
> PDFReference class  instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows 
> following improvements without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF 
> having 150K pages in parallel with max  4GB Ram; without any generation 
> issues.
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I 
> would like to understand  if we can target this change and if not what is the 
> reason behind using SoftReference?

This message was sent by Atlassian Jira

Reply via email to