[ https://issues.apache.org/jira/browse/PDFBOX-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365270#comment-17365270 ]
lanshiqin edited comment on PDFBOX-5217 at 6/18/21, 6:35 AM: ------------------------------------------------------------- I found that instructing the renderer to allow subsampling of the image before drawing can effectively reduce the maximum memory required for file parsing, a usage example that should be included in the FAQ to help anyone who needs it {code:java} // Indicates that the renderer is allowed to sub sample the image before drawing. // This is very important to prevent OOM when parsing complex PDF files renderer.setSubsamplingAllowed(true); {code} [PDF file|https://c-dev.weimobwmc.com/qa-OmdK/52f40ec1d49f4c5985f33f691d19c1a4.pdf] was (Author: lanshiqin): I found that instructing the renderer to allow subsampling of the image before drawing can effectively reduce the maximum memory required for file parsing, a usage example that should be included in the FAQ to help anyone who needs it {code:java} // Indicates that the renderer is allowed to sub sample the image before drawing. // This is very important to prevent OOM when parsing complex PDF files renderer.setSubsamplingAllowed(true); {code} [^example.pdf] > Rendering takes up too much memory, easy OOM > -------------------------------------------- > > Key: PDFBOX-5217 > URL: https://issues.apache.org/jira/browse/PDFBOX-5217 > Project: PDFBox > Issue Type: Improvement > Components: Rendering > Affects Versions: 2.0.24, 3.0.0 PDFBox > Environment: Oracle JDK 1.8.0_291-b10 > MacOS BigSur (CPU i5, RAM8GB) > Windows 10 (CPU i7 2.80GHz, RAM 16GB) > Reporter: lanshiqin > Priority: Major > > Conversion of a 20MB PDF file to an image resource consumes more than 8GB of > memory and takes 5 minutes. That's an intolerable fact. > Debug found that the memory soared when the file stream was finally read. > This is my code: > > {code:java} > try(InputStream in = new URL(pdfFileUrl).openStream(); > PDDocument document = PDDocument.load(in, > MemoryUsageSetting.setupTempFileOnly())){ > document.setResourceCache(null); > PDFRenderer renderer = new PDFRenderer(document); > List<String> imgUrlList = Lists.newArrayList(); > for (int i = 0; i < document.getNumberOfPages(); i++) { > BufferedImage bufferedImage = renderer.renderImageWithDPI(i, DPI); > File tempFile = new File(OFFICE_CONVERT_TEMP_DIR + fileName + "_" + > i); > try { > ImageIO.write(bufferedImage, "png", tempFile); > imgUrlList.add("upload to media center get url todo "+i); > } finally { > FileUtils.deleteQuietly(tempFile); > bufferedImage.getGraphics().dispose(); > } > } > return imgUrlList; > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org