[ https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
asdpboy updated PDFBOX-5950: ---------------------------- Attachment: after.png before.png oom.png > pdfbox PDFMergerUtility Potential OOM issues > --------------------------------------------- > > Key: PDFBOX-5950 > URL: https://issues.apache.org/jira/browse/PDFBOX-5950 > Project: PDFBox > Issue Type: Bug > Affects Versions: 2.0.32, 3.0.0 PDFBox > Environment: jdk11 > Reporter: asdpboy > Priority: Major > Attachments: after.png, before.png, oom.png > > > I have identified a potential bug in Apache PDFBox and would like to report > it. Below are the details: > > - *{*}PDFBox Version{*}*: 2.0.32 、3.0.0 > - *{*}Java Version{*}*: 11 > > When there are a large number of sources (e.g., thousands), the `tobeclosed` > method will load the PDF document into memory. This may pose a risk of > Out-of-Memory (OOM) during the merge process. > > The following adjustments can be made,close the sourceDoc object immediately . > > org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments > for (Object sourceObject : sources){ > PDDocument sourceDoc = null; > if (sourceObject instanceof File) \{ sourceDoc = PDDocument.load((File) > sourceObject, partitionedMemSetting); } > else > { sourceDoc = PDDocument.load((InputStream) sourceObject, > partitionedMemSetting); } > {color:#ff0000}try {{color} > {color:#ff0000}appendDocument(destination, sourceDoc);{color} > {color:#FF0000}}finally \{{color} > {color:#ff0000}IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", > null);{color} > {color:#FF0000}} > > one of the oom case > > Comparison of Memory Usage Before and After Modification (Merging a 16.8MB > File 200 Times, with JVM Heap Size Limit Set to 2GB) > - *{*}Before Modification{*}*: An OutOfMemoryError (OOM) occurred after just > over 1 minute of operation. Due to insufficient heap memory, Full GC (Full > Garbage Collection) was triggered frequently, which can be observed from the > CPU usage curve on the left. > !https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922546412-1478a69c-f8b5-4412-80b7-7f573f5ffa7f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0! > > - *{*}After Modification{*}*: The heap memory is now able to be collected > normally without causing an OOM. > !https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922398317-d36dce8f-8aea-4dcf-b809-78e2333b514f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org