[ https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925443#comment-17925443 ]
asdpboy commented on PDFBOX-5950: --------------------------------- [~tilman] Thank you for replying.I have done a lot of testing, and I have not been able to reproduce the situation you mentioned using my modified code. From the code of the legacyMergeDocuments, it does not appear that the other objects of sourceDoc are being held.Can you show me the code that situation you mentioned? > pdfbox PDFMergerUtility Potential OOM issue > -------------------------------------------- > > Key: PDFBOX-5950 > URL: https://issues.apache.org/jira/browse/PDFBOX-5950 > Project: PDFBox > Issue Type: Bug > Components: Utilities > Affects Versions: 2.0.32, 3.0.0 PDFBox > Environment: jdk11 > Reporter: asdpboy > Priority: Major > Attachments: after.png, before.png, oom.png > > > I have identified a potential bug in Apache PDFBox and would like to report > it. Below are the details: > > When there are a large number of sources (e.g., thousands), the `tobeclosed` > method will load the PDF document into memory. This may pose a risk of > Out-of-Memory (OOM) during the merge process. > > The following adjustments can be made, close the sourceDoc object immediately > . > > org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments > {code:java} > for (Object sourceObject : sources) > { > PDDocument sourceDoc = null; > if (sourceObject instanceof File) > { > sourceDoc = PDDocument.load((File) sourceObject, > partitionedMemSetting); > } > else > { > sourceDoc = PDDocument.load((InputStream) sourceObject, > partitionedMemSetting); > } > try > { > appendDocument(destination, sourceDoc); > } > finally > { > IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null); > } > } > {code} > one of the oom cases > !oom.png! > Comparison of Memory Usage Before and After Modification (Merging a 16.8MB > File 200 Times, with JVM Heap Size Limit Set to 2GB) > Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 > minute of operation. Due to insufficient heap memory, Full GC (Full Garbage > Collection) was triggered frequently, which can be observed from the CPU > usage curve on the left. > !before.png! > After Modification: The heap memory is now able to be collected normally > without causing an OOM. > !after.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org