[ 
https://issues.apache.org/jira/browse/PDFBOX-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005919#comment-17005919
 ] 

Ben Manes commented on PDFBOX-4726:
-----------------------------------

To summarize, a customer sent in PDFs where the scans have a resolution of 
9166x7049. The rastor's data buffer is then an int[] of 64,000 elements, which 
means 256mb. When multiple files are being processed this puts stress on the 
GC, which allocates these objects in a humongous region since they do not fit 
into the young generation. The GC fails to discard them aggressively, so the 
server crashes where 2.5gb is live and 2.5gb is dead but ignored.

For our usage we will probably try to calculate the scaling factor to reduce 
down to a reasonable maximum dimension, e.g. 1650. Hopefully that makes this 
more robust and less memory hungry.

Ideally the rastor would be swapped to disk instead of entirely in-memory. It 
is wasteful and doesn't fit into what GC algorithms are optimized for. While 
inefficient at too large a size, it shouldn't be a memory hog.

> PDFRenderer uses excessive memory
> ---------------------------------
>
>                 Key: PDFBOX-4726
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4726
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: heap.png, instance.png, stacktrace.png
>
>
> {{PDFRenderer.renderImage}} uses BufferedImage with only in-memory data. This 
> is uncompressed and can use excessive memory. This occurs despite setting 
> \{{MemoryUsageSetting}} being configured on the document for disk space, 
> which should be honored.
> This [stackoverflow answer|https://stackoverflow.com/a/53205617/19450] 
> suggests using a {{WritableRaster}} backed by a temporary file. This change 
> cannot be done in user code and requires updating the {{PDFRenderer}}.
> I am currently trying to track down a PDF that caused out-of-memory issues. 
> From the heap dump only a few {{BufferedImages}} where in memory, but they 
> took 6gb in their uncompressed data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to