[ 
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925511#comment-17925511
 ] 

Tilman Hausherr edited comment on PDFBOX-5950 at 2/10/25 10:51 AM:
-------------------------------------------------------------------

Your change would work by making another change:
{code:java}
private void mergeInto(COSDictionary src, COSDictionary dst, PDFCloneUtility 
cloner, Set<COSName> exclude) throws IOException
{
    for (Map.Entry<COSName, COSBase> entry : src.entrySet())
    {
        if (!exclude.contains(entry.getKey()) && 
!dst.containsKey(entry.getKey()))
        {
            //dst.setItem(entry.getKey(), entry.getValue());
            dst.setItem(entry.getKey(), 
cloner.cloneForNewDocument(entry.getValue()));
        }
    }
}
{code}
A similar change also in {{mergeRoleMap}} where {{setItem}} is called, despite 
the comment "clone not needed".

(more related changes are needed but these are obvious)

However I'm going to sleep at least one night over this.


was (Author: tilman):
Your change would work by making another change:
{code:java}
private void mergeInto(COSDictionary src, COSDictionary dst, PDFCloneUtility 
cloner, Set<COSName> exclude) throws IOException
{
    for (Map.Entry<COSName, COSBase> entry : src.entrySet())
    {
        if (!exclude.contains(entry.getKey()) && 
!dst.containsKey(entry.getKey()))
        {
            //dst.setItem(entry.getKey(), entry.getValue());
            dst.setItem(entry.getKey(), 
cloner.cloneForNewDocument(entry.getValue()));
        }
    }
}
{code}
(more related changes are needed but these are obvious)

However I'm going to sleep at least one night over this.

> pdfbox  PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
>                 Key: PDFBOX-5950
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5950
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.32, 3.0.0 PDFBox
>         Environment: jdk11
>            Reporter: asdpboy
>            Priority: Major
>         Attachments: after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report 
> it. Below are the details:
>  
> When there are a large number of sources (e.g., thousands), the `tobeclosed` 
> method will load the PDF document into memory. This may pose a risk of 
> Out-of-Memory (OOM) during the merge process.
>  
> The following adjustments can be made, close the sourceDoc object immediately 
> .
>  
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> {code:java}
> for (Object sourceObject : sources)
> {
>     PDDocument sourceDoc = null;
>     if (sourceObject instanceof File)
>     {
>         sourceDoc = PDDocument.load((File) sourceObject, 
> partitionedMemSetting);
>     }
>     else
>     {
>         sourceDoc = PDDocument.load((InputStream) sourceObject, 
> partitionedMemSetting);
>     }
>     try
>     {
>         appendDocument(destination, sourceDoc);
>     }
>     finally
>     {
>         IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
>     }
> }
> {code}
> one of the oom cases
> !oom.png!  
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB 
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
> Collection) was triggered frequently, which can be observed from the CPU 
> usage curve on the left.
> !before.png!
>  After Modification: The heap memory is now able to be collected normally 
> without causing an OOM.
> !after.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to