[ 
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925994#comment-17925994
 ] 

Tilman Hausherr commented on PDFBOX-5950:
-----------------------------------------

The test is just to provide evidence for my argument, you don't have to do 
anything. I'll need some more time for merge tests with 230000 files to see if 
there are any troubles with my local code.

> pdfbox  PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
>                 Key: PDFBOX-5950
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5950
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.32, 3.0.0 PDFBox
>         Environment: jdk11
>            Reporter: asdpboy
>            Priority: Major
>         Attachments: PDFMergerUtility.java, after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report 
> it. Below are the details:
>  
> When there are a large number of sources (e.g., thousands), the `tobeclosed` 
> method will load the PDF document into memory. This may pose a risk of 
> Out-of-Memory (OOM) during the merge process.
>  
> The following adjustments can be made, close the sourceDoc object immediately 
> .
>  
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> {code:java}
> for (Object sourceObject : sources)
> {
>     PDDocument sourceDoc = null;
>     if (sourceObject instanceof File)
>     {
>         sourceDoc = PDDocument.load((File) sourceObject, 
> partitionedMemSetting);
>     }
>     else
>     {
>         sourceDoc = PDDocument.load((InputStream) sourceObject, 
> partitionedMemSetting);
>     }
>     try
>     {
>         appendDocument(destination, sourceDoc);
>     }
>     finally
>     {
>         IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
>     }
> }
> {code}
> one of the oom cases
> !oom.png!  
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB 
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
> Collection) was triggered frequently, which can be observed from the CPU 
> usage curve on the left.
> !before.png!
>  After Modification: The heap memory is now able to be collected normally 
> without causing an OOM.
> !after.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to