[
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
asdpboy updated PDFBOX-5950:
----------------------------
Summary: pdfbox PDFMergerUtility Potential OOM issue (was: pdfbox
PDFMergerUtility Potential OOM issues)
> pdfbox PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
> Key: PDFBOX-5950
> URL: https://issues.apache.org/jira/browse/PDFBOX-5950
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.32, 3.0.0 PDFBox
> Environment: jdk11
> Reporter: asdpboy
> Priority: Major
> Attachments: after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report
> it. Below are the details:
>
> - {*}{{*}}PDFBox Version{{*}}{*}: 2.0.32 、3.0.0
> - {*}{{*}}Java Version{{*}}{*}: 11
>
> When there are a large number of sources (e.g., thousands), the `tobeclosed`
> method will load the PDF document into memory. This may pose a risk of
> Out-of-Memory (OOM) during the merge process.
>
> The following adjustments can be made,close the sourceDoc object immediately .
>
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> for (Object sourceObject : sources) \{ PDDocument sourceDoc = null; if
> (sourceObject instanceof File) \{ sourceDoc = PDDocument.load((File)
> sourceObject, partitionedMemSetting); }
> else
> { sourceDoc = PDDocument.load((InputStream) sourceObject,
> partitionedMemSetting); }
> {color:#ff0000}try {{color}
> {color:#ff0000}appendDocument(destination, sourceDoc);{color}
> {color:#FF0000}}finally {
> IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
> {color:#FF0000}}
>
> one of the oom case
> !oom.png!
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage
> Collection) was triggered frequently, which can be observed from the CPU
> usage curve on the left.
> !before.png!
> After Modification: The heap memory is now able to be collected normally
> without causing an OOM.
> !after.png!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]