[
https://issues.apache.org/jira/browse/PDFBOX-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024607#comment-17024607
]
Ben Manes commented on PDFBOX-4726:
-----------------------------------
This occurred again, see the 1-26-20 attachments. The threads were in
{{PDFRenderer}} and the heap is consumed by {{BufferedImage}} and {{int[]}}
arrays. The application code doing this work is below, which tries as much as
possible to be memory-friendly. Still, it fails due to PdfBox.
About 50% of the heap is eligible for GC but due to large objects, G1 is not
collecting before it fails. This is on JDK11 and I will probably try switching
to 13 + Shenandoah. However, ideally the image would not be rendered fully
in-memory and the maximum would have been bound by the target dimensions.
Do you have any advise here?
{code}
private static final Dimensions TARGET_DIMENSIONS = Dimensions.create(1650,
1650);
private static final String FORMAT = "jpg";
/** Renders the page to an image and returns the file path. */
private Path renderPage(Context context, Pdf pdf, PdfMetadata metadata,
PDDocument document, int pageNumber) throws IOException {
BufferedImage image = null;
try {
String name = String.format("page_%d.%s", (pageNumber + 1), FORMAT);
Path path =
context.storage().tempDirectory(pdf.getUniqueId()).resolve(name);
PDRectangle cropBox = document.getPage(pageNumber).getCropBox();
float scaleY = TARGET_DIMENSIONS.getHeight() / cropBox.getHeight();
float scaleX = TARGET_DIMENSIONS.getWidth() / cropBox.getWidth();
float scaleBy = Math.max(scaleX, scaleY);
image = new PDFRenderer(document).renderImage(pageNumber, scaleBy);
ImageIO.write(image, FORMAT, path.toFile());
return path;
} finally {
if (image != null) {
image.getGraphics().dispose();
image.flush();
}
}
}
{code}
> PDFRenderer uses excessive memory
> ---------------------------------
>
> Key: PDFBOX-4726
> URL: https://issues.apache.org/jira/browse/PDFBOX-4726
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Ben Manes
> Priority: Major
> Attachments: heap 1-26-20.png, heap.png, instance.png, reachability
> 1-26-20.png, stacktrace.png
>
>
> {{PDFRenderer.renderImage}} uses BufferedImage with only in-memory data. This
> is uncompressed and can use excessive memory. This occurs despite setting
> \{{MemoryUsageSetting}} being configured on the document for disk space,
> which should be honored.
> This [stackoverflow answer|https://stackoverflow.com/a/53205617/19450]
> suggests using a {{WritableRaster}} backed by a temporary file. This change
> cannot be done in user code and requires updating the {{PDFRenderer}}.
> I am currently trying to track down a PDF that caused out-of-memory issues.
> From the heap dump only a few {{BufferedImages}} where in memory, but they
> took 6gb in their uncompressed data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]