[ 
https://issues.apache.org/jira/browse/PDFBOX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925283#comment-17925283
 ] 

Tilman Hausherr commented on PDFBOX-5950:
-----------------------------------------

Even if the sourceDoc object isn't used, this object holds a tree of other 
objects, which includes streams. Closing sourceDoc means that these streams 
would also be closed. If one of these streams is also held by the destination 
object (because for some reason the reference to the stream was copied instead 
of fully cloning the stream), then there will be an exception when saving it 
because that streams has been closed, see e.g. 
https://stackoverflow.com/questions/63589763/ which also links to similar 
problems.

> pdfbox  PDFMergerUtility Potential OOM issue
> --------------------------------------------
>
>                 Key: PDFBOX-5950
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5950
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.32, 3.0.0 PDFBox
>         Environment: jdk11
>            Reporter: asdpboy
>            Priority: Major
>         Attachments: after.png, before.png, oom.png
>
>
> I have identified a potential bug in Apache PDFBox and would like to report 
> it. Below are the details:
>  
> When there are a large number of sources (e.g., thousands), the `tobeclosed` 
> method will load the PDF document into memory. This may pose a risk of 
> Out-of-Memory (OOM) during the merge process.
>  
> The following adjustments can be made, close the sourceDoc object immediately 
> .
>  
> org.apache.pdfbox.multipdf.PDFMergerUtility#legacyMergeDocuments
> {code:java}
> for (Object sourceObject : sources)
> {
>     PDDocument sourceDoc = null;
>     if (sourceObject instanceof File)
>     {
>         sourceDoc = PDDocument.load((File) sourceObject, 
> partitionedMemSetting);
>     }
>     else
>     {
>         sourceDoc = PDDocument.load((InputStream) sourceObject, 
> partitionedMemSetting);
>     }
>     try
>     {
>         appendDocument(destination, sourceDoc);
>     }
>     finally
>     {
>         IOUtils.closeAndLogException(sourceDoc, LOG, "PDDocument", null);
>     }
> }
> {code}
> one of the oom cases
> !oom.png!  
> Comparison of Memory Usage Before and After Modification (Merging a 16.8MB 
> File 200 Times, with JVM Heap Size Limit Set to 2GB)
> Before Modification: An OutOfMemoryError (OOM) occurred after just over 1 
> minute of operation. Due to insufficient heap memory, Full GC (Full Garbage 
> Collection) was triggered frequently, which can be observed from the CPU 
> usage curve on the left.
> !before.png!
>  After Modification: The heap memory is now able to be collected normally 
> without causing an OOM.
> !after.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to