[ 
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925443#comment-17925443
 ] 

asdpboy commented on PDFBOX-5950:
---------------------------------

[~tilman] Thank you for replying.I have done a lot of testing, and I have not 
been able to reproduce the situation you mentioned using my modified code. From 
the code of the legacyMergeDocuments, it does not appear that the other objects 
of sourceDoc are being held.Can you show me the code that situation you 
mentioned? 

> pdfbox  PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
>                 Key: PDFBOX-5950
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5950
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.32, 3.0.0 PDFBox
>         Environment: jdk11
>            Reporter: asdpboy
>            Priority: Major
>         Attachments: after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report 
> it. Below are the details:
>  
> When there are a large number of sources (e.g., thousands), the `tobeclosed` 
> method will load the PDF document into memory. This may pose a risk of 
> Out-of-Memory (OOM) during the merge process.
>  
> The following adjustments can be made, close the sourceDoc object immediately 
> .
>  
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> {code:java}
> for (Object sourceObject : sources)
> {
>     PDDocument sourceDoc = null;
>     if (sourceObject instanceof File)
>     {
>         sourceDoc = PDDocument.load((File) sourceObject, 
> partitionedMemSetting);
>     }
>     else
>     {
>         sourceDoc = PDDocument.load((InputStream) sourceObject, 
> partitionedMemSetting);
>     }
>     try
>     {
>         appendDocument(destination, sourceDoc);
>     }
>     finally
>     {
>         IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
>     }
> }
> {code}
> one of the oom cases
> !oom.png!  
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB 
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
> Collection) was triggered frequently, which can be observed from the CPU 
> usage curve on the left.
> !before.png!
>  After Modification: The heap memory is now able to be collected normally 
> without causing an OOM.
> !after.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to