[
https://issues.apache.org/jira/browse/PDFBOX-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365270#comment-17365270
]
lanshiqin edited comment on PDFBOX-5217 at 6/18/21, 6:35 AM:
-------------------------------------------------------------
I found that instructing the renderer to allow subsampling of the image before
drawing can effectively reduce the maximum memory required for file parsing, a
usage example that should be included in the FAQ to help anyone who needs it
{code:java}
// Indicates that the renderer is allowed to sub sample the image before
drawing.
// This is very important to prevent OOM when parsing complex PDF files
renderer.setSubsamplingAllowed(true);
{code}
[PDF
file|https://c-dev.weimobwmc.com/qa-OmdK/52f40ec1d49f4c5985f33f691d19c1a4.pdf]
was (Author: lanshiqin):
I found that instructing the renderer to allow subsampling of the image before
drawing can effectively reduce the maximum memory required for file parsing, a
usage example that should be included in the FAQ to help anyone who needs it
{code:java}
// Indicates that the renderer is allowed to sub sample the image before
drawing.
// This is very important to prevent OOM when parsing complex PDF files
renderer.setSubsamplingAllowed(true);
{code}
[^example.pdf]
> Rendering takes up too much memory, easy OOM
> --------------------------------------------
>
> Key: PDFBOX-5217
> URL: https://issues.apache.org/jira/browse/PDFBOX-5217
> Project: PDFBox
> Issue Type: Improvement
> Components: Rendering
> Affects Versions: 2.0.24, 3.0.0 PDFBox
> Environment: Oracle JDK 1.8.0_291-b10
> MacOS BigSur (CPU i5, RAM8GB)
> Windows 10 (CPU i7 2.80GHz, RAM 16GB)
> Reporter: lanshiqin
> Priority: Major
>
> Conversion of a 20MB PDF file to an image resource consumes more than 8GB of
> memory and takes 5 minutes. That's an intolerable fact.
> Debug found that the memory soared when the file stream was finally read.
> This is my code:
>
> {code:java}
> try(InputStream in = new URL(pdfFileUrl).openStream();
> PDDocument document = PDDocument.load(in,
> MemoryUsageSetting.setupTempFileOnly())){
> document.setResourceCache(null);
> PDFRenderer renderer = new PDFRenderer(document);
> List<String> imgUrlList = Lists.newArrayList();
> for (int i = 0; i < document.getNumberOfPages(); i++) {
> BufferedImage bufferedImage = renderer.renderImageWithDPI(i, DPI);
> File tempFile = new File(OFFICE_CONVERT_TEMP_DIR + fileName + "_" +
> i);
> try {
> ImageIO.write(bufferedImage, "png", tempFile);
> imgUrlList.add("upload to media center get url todo "+i);
> } finally {
> FileUtils.deleteQuietly(tempFile);
> bufferedImage.getGraphics().dispose();
> }
> }
> return imgUrlList;
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]