[ 
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5950:
------------------------------------
    Description: 
I have identified a potential bug in Apache PDFBox and would like to report it. 
Below are the details:
 
When there are a large number of sources (e.g., thousands), the `tobeclosed` 
method will load the PDF document into memory. This may pose a risk of 
Out-of-Memory (OOM) during the merge process.
 
The following adjustments can be made, close the sourceDoc object immediately .
 
org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
{code:java}
for (Object sourceObject : sources)
{
    PDDocument sourceDoc = null;
    if (sourceObject instanceof File)
    {
        sourceDoc = PDDocument.load((File) sourceObject, partitionedMemSetting);
    }
    else
    {
        sourceDoc = PDDocument.load((InputStream) sourceObject, 
partitionedMemSetting);
    }
    try
    {
        appendDocument(destination, sourceDoc);
    }
    finally
    {
        IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
    }
}
{code}
one of the oom cases
!oom.png!  

Comparison of Memory Usage Before and After Modification (Merging a 16.8MB File 
200 Times, with JVM Heap Size Limit Set to 2GB)

Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
Collection) was triggered frequently, which can be observed from the CPU usage 
curve on the left.
!before.png!
 After Modification: The heap memory is now able to be collected normally 
without causing an OOM.
!after.png!
 

  was:
I have identified a potential bug in Apache PDFBox and would like to report it. 
Below are the details:
 
 - {*}{{*}}PDFBox Version{{*}}{*}: 2.0.32 、3.0.0
 - {*}{{*}}Java Version{{*}}{*}: 11
 
When there are a large number of sources (e.g., thousands), the `tobeclosed` 
method will load the PDF document into memory. This may pose a risk of 
Out-of-Memory (OOM) during the merge process.
 
The following adjustments can be made,close the sourceDoc object immediately .
 
org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
for (Object sourceObject : sources) \{ PDDocument sourceDoc = null; if 
(sourceObject instanceof File) \{ sourceDoc = PDDocument.load((File) 
sourceObject, partitionedMemSetting); }
else
{ sourceDoc = PDDocument.load((InputStream) sourceObject, 
partitionedMemSetting); }
{color:#ff0000}try {{color}
{color:#ff0000}appendDocument(destination, sourceDoc);{color}
{color:#FF0000}}finally {
IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
{color:#FF0000}}
 
one of the oom case
!oom.png!  

Comparison of Memory Usage Before and After Modification (Merging a 16.8MB File 
200 Times, with JVM Heap Size Limit Set to 2GB)

Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
Collection) was triggered frequently, which can be observed from the CPU usage 
curve on the left.
!before.png!
 After Modification: The heap memory is now able to be collected normally 
without causing an OOM.
!after.png!
 


> pdfbox  PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
>                 Key: PDFBOX-5950
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5950
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.32, 3.0.0 PDFBox
>         Environment: jdk11
>            Reporter: asdpboy
>            Priority: Major
>         Attachments: after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report 
> it. Below are the details:
>  
> When there are a large number of sources (e.g., thousands), the `tobeclosed` 
> method will load the PDF document into memory. This may pose a risk of 
> Out-of-Memory (OOM) during the merge process.
>  
> The following adjustments can be made, close the sourceDoc object immediately 
> .
>  
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> {code:java}
> for (Object sourceObject : sources)
> {
>     PDDocument sourceDoc = null;
>     if (sourceObject instanceof File)
>     {
>         sourceDoc = PDDocument.load((File) sourceObject, 
> partitionedMemSetting);
>     }
>     else
>     {
>         sourceDoc = PDDocument.load((InputStream) sourceObject, 
> partitionedMemSetting);
>     }
>     try
>     {
>         appendDocument(destination, sourceDoc);
>     }
>     finally
>     {
>         IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
>     }
> }
> {code}
> one of the oom cases
> !oom.png!  
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB 
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
> Collection) was triggered frequently, which can be observed from the CPU 
> usage curve on the left.
> !before.png!
>  After Modification: The heap memory is now able to be collected normally 
> without causing an OOM.
> !after.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to