[jira] [Commented] (PDFBOX-4726) PDFRenderer uses excessive memory

Ben Manes (Jira) Wed, 01 Jan 2020 00:33:12 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006340#comment-17006340
 ]


Ben Manes commented on PDFBOX-4726:
-----------------------------------

Oh yeah, I was certainly thinking {{BigBufferedImage}} would need to be 
rewritten for your needs and understood. I didn't realize it is part of an 
[open-source 
project|[https://github.com/DataSystemsLab/GeoSpark/blob/master/viz/src/main/java/org/datasyslab/geosparkviz/core/BigBufferedImage.java]]
 and was a little embarrassed at suggesting a bit of random code.

It was taking 4 minutes on cloud due to GC exhaustion, but less than a second 
locally, at 200 DPI. The problem is that GCs do not handle churning of massive 
objects very well. G1 [does 
not|[https://bugs.openjdk.java.net/browse/JDK-8191565]] try to collect in STW 
and fails instead, while newer ones do as a last ditch effort. That's still 
miserable because you don't want many STW events, even if the GC can tolerate 
the abuse by degrading.

In my case, I didn't consider that the PDF could be in an excessively large 
resolution and chose the DPI option for print quality. That was naive on my 
part, as users will always surprise you. The application code fix is the 
correct solution and better all around. However, making this library more 
tolerant to such abuse and a better citizen to the JVM / GC is still desirable 
goal.

For {{MemoryUsageSetting}}, I use 2mb heap and a temp directory for every 
application usage. This configuration is used in scenarios like render pages, 
generating a pdf, merging pdfs. For documents large and small, I haven't 
observed any performance problems related to this. I like that it gives me the 
choice of by default being in-memory, but allowing overflow to disk if deemed 
preferable.

> PDFRenderer uses excessive memory
> ---------------------------------
>
>                 Key: PDFBOX-4726
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4726
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: heap.png, instance.png, stacktrace.png
>
>
> {{PDFRenderer.renderImage}} uses BufferedImage with only in-memory data. This 
> is uncompressed and can use excessive memory. This occurs despite setting 
> \{{MemoryUsageSetting}} being configured on the document for disk space, 
> which should be honored.
> This [stackoverflow answer|https://stackoverflow.com/a/53205617/19450] 
> suggests using a {{WritableRaster}} backed by a temporary file. This change 
> cannot be done in user code and requires updating the {{PDFRenderer}}.
> I am currently trying to track down a PDF that caused out-of-memory issues. 
> From the heap dump only a few {{BufferedImages}} where in memory, but they 
> took 6gb in their uncompressed data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4726) PDFRenderer uses excessive memory

Reply via email to