[
https://issues.apache.org/jira/browse/PDFBOX-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006340#comment-17006340
]
Ben Manes commented on PDFBOX-4726:
-----------------------------------
Oh yeah, I was certainly thinking {{BigBufferedImage}} would need to be
rewritten for your needs and understood. I didn't realize it is part of an
[open-source
project|[https://github.com/DataSystemsLab/GeoSpark/blob/master/viz/src/main/java/org/datasyslab/geosparkviz/core/BigBufferedImage.java]]
and was a little embarrassed at suggesting a bit of random code.
It was taking 4 minutes on cloud due to GC exhaustion, but less than a second
locally, at 200 DPI. The problem is that GCs do not handle churning of massive
objects very well. G1 [does
not|[https://bugs.openjdk.java.net/browse/JDK-8191565]] try to collect in STW
and fails instead, while newer ones do as a last ditch effort. That's still
miserable because you don't want many STW events, even if the GC can tolerate
the abuse by degrading.
In my case, I didn't consider that the PDF could be in an excessively large
resolution and chose the DPI option for print quality. That was naive on my
part, as users will always surprise you. The application code fix is the
correct solution and better all around. However, making this library more
tolerant to such abuse and a better citizen to the JVM / GC is still desirable
goal.
For {{MemoryUsageSetting}}, I use 2mb heap and a temp directory for every
application usage. This configuration is used in scenarios like render pages,
generating a pdf, merging pdfs. For documents large and small, I haven't
observed any performance problems related to this. I like that it gives me the
choice of by default being in-memory, but allowing overflow to disk if deemed
preferable.
> PDFRenderer uses excessive memory
> ---------------------------------
>
> Key: PDFBOX-4726
> URL: https://issues.apache.org/jira/browse/PDFBOX-4726
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Ben Manes
> Priority: Major
> Attachments: heap.png, instance.png, stacktrace.png
>
>
> {{PDFRenderer.renderImage}} uses BufferedImage with only in-memory data. This
> is uncompressed and can use excessive memory. This occurs despite setting
> \{{MemoryUsageSetting}} being configured on the document for disk space,
> which should be honored.
> This [stackoverflow answer|https://stackoverflow.com/a/53205617/19450]
> suggests using a {{WritableRaster}} backed by a temporary file. This change
> cannot be done in user code and requires updating the {{PDFRenderer}}.
> I am currently trying to track down a PDF that caused out-of-memory issues.
> From the heap dump only a few {{BufferedImages}} where in memory, but they
> took 6gb in their uncompressed data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]