[ 
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

asdpboy updated PDFBOX-5950:
----------------------------
    Description: 
I have identified a potential bug in Apache PDFBox and would like to report it. 
Below are the details:
 
 - *{*}PDFBox Version{*}*: 2.0.32 、3.0.0
 - *{*}Java Version{*}*: 11
 
When there are a large number of sources (e.g., thousands), the `tobeclosed` 
method will load the PDF document into memory. This may pose a risk of 
Out-of-Memory (OOM) during the merge process.
 
The following adjustments can be made,close the sourceDoc object immediately .
 
org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
for (Object sourceObject : sources){
PDDocument sourceDoc = null;
if (sourceObject instanceof File) \{ sourceDoc = PDDocument.load((File) 
sourceObject, partitionedMemSetting); }
else
{ sourceDoc = PDDocument.load((InputStream) sourceObject, 
partitionedMemSetting); }
{color:#ff0000}try {{color}
{color:#ff0000}appendDocument(destination, sourceDoc);{color}
{color:#FF0000}}finally \{{color}
{color:#ff0000}IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", 
null);{color}
{color:#FF0000}}
 
one of the oom case

 
Comparison of Memory Usage Before and After Modification (Merging a 16.8MB File 
200 Times, with JVM Heap Size Limit Set to 2GB)

 - *{*}Before Modification{*}*: An OutOfMemoryError (OOM) occurred after just 
over 1 minute of operation. Due to insufficient heap memory, Full GC (Full 
Garbage Collection) was triggered frequently, which can be observed from the 
CPU usage curve on the left.
!https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922546412-1478a69c-f8b5-4412-80b7-7f573f5ffa7f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!
 
 - *{*}After Modification{*}*: The heap memory is now able to be collected 
normally without causing an OOM.
!https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922398317-d36dce8f-8aea-4dcf-b809-78e2333b514f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!
 

  was:
I have identified a potential bug in Apache PDFBox and would like to report it. 
Below are the details:
 
- **PDFBox Version**: 2.0.32 、3.0.0
- **Java Version**: 11
 
When there are a large number of sources (e.g., thousands), the `tobeclosed` 
method will load the PDF document into memory. This may pose a risk of 
Out-of-Memory (OOM) during the merge process.
 
The following adjustments can be made,close the sourceDoc object immediately .
 
org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
for (Object sourceObject : sources){
PDDocument sourceDoc = null;
if (sourceObject instanceof File)
{
sourceDoc = PDDocument.load((File) sourceObject, partitionedMemSetting);
}
else
{
sourceDoc = PDDocument.load((InputStream) sourceObject,
partitionedMemSetting);
}
{color:#FF0000}try {{color}
{color:#FF0000}appendDocument(destination, sourceDoc);{color}
{color:#FF0000}}finally {{color}
{color:#FF0000}IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", 
null);{color}
{color:#FF0000}}{color}
 
{color:#172b4d}one of the oom case{color}
{color:#FF0000}!https://cdn.nlark.com/yuque/0/2025/png/29126839/1738913671707-db4983a0-f7c8-4a6c-94b8-17d38f04a155.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!{color}
 
Comparison of Memory Usage Before and After Modification (Merging a 16.8MB File 
200 Times, with JVM Heap Size Limit Set to 2GB)
- **Before Modification**: An OutOfMemoryError (OOM) occurred after just over 1 
minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
Collection) was triggered frequently, which can be observed from the CPU usage 
curve on the left.
!https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922546412-1478a69c-f8b5-4412-80b7-7f573f5ffa7f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!
 
- **After Modification**: The heap memory is now able to be collected normally 
without causing an OOM.
!https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922398317-d36dce8f-8aea-4dcf-b809-78e2333b514f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!
 


> pdfbox  PDFMergerUtility Potential OOM issues
> ---------------------------------------------
>
>                 Key: PDFBOX-5950
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5950
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.32, 3.0.0 PDFBox
>         Environment: jdk11
>            Reporter: asdpboy
>            Priority: Major
>
> I have identified a potential bug in Apache PDFBox and would like to report 
> it. Below are the details:
>  
>  - *{*}PDFBox Version{*}*: 2.0.32 、3.0.0
>  - *{*}Java Version{*}*: 11
>  
> When there are a large number of sources (e.g., thousands), the `tobeclosed` 
> method will load the PDF document into memory. This may pose a risk of 
> Out-of-Memory (OOM) during the merge process.
>  
> The following adjustments can be made,close the sourceDoc object immediately .
>  
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> for (Object sourceObject : sources){
> PDDocument sourceDoc = null;
> if (sourceObject instanceof File) \{ sourceDoc = PDDocument.load((File) 
> sourceObject, partitionedMemSetting); }
> else
> { sourceDoc = PDDocument.load((InputStream) sourceObject, 
> partitionedMemSetting); }
> {color:#ff0000}try {{color}
> {color:#ff0000}appendDocument(destination, sourceDoc);{color}
> {color:#FF0000}}finally \{{color}
> {color:#ff0000}IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", 
> null);{color}
> {color:#FF0000}}
>  
> one of the oom case
>  
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB 
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
>  - *{*}Before Modification{*}*: An OutOfMemoryError (OOM) occurred after just 
> over 1 minute of operation. Due to insufficient heap memory, Full GC (Full 
> Garbage Collection) was triggered frequently, which can be observed from the 
> CPU usage curve on the left.
> !https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922546412-1478a69c-f8b5-4412-80b7-7f573f5ffa7f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!
>  
>  - *{*}After Modification{*}*: The heap memory is now able to be collected 
> normally without causing an OOM.
> !https://cdn.nlark.com/yuque/0/2025/png/29126839/1738922398317-d36dce8f-8aea-4dcf-b809-78e2333b514f.png?x-oss-process=image%2Fformat%2Cwebp%2Fresize%2Cw_1500%2Climit_0!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to