ds3v opened a new pull request #131:
URL: https://github.com/apache/pdfbox/pull/131


   We are trying to use the PDFMergerUtility to merge a huge number of PDF 
files(up to 50000 pages in summary), but are having an issue with large heap. 
We have tried using setupTempFileOnly() but that doesn't seem to help.  
   We analyze heap dumps and found that main part of a heap is a current 
page-buffers(arrays of 4096 bytes), that are referenced from ScratchFileBuffer. 
   The idea of this fix is to remove reference to page-buffer from 
ScratchFileBuffer when COSStream completely processed.
   
   Sample code:
   ```java
   public class PdfBoxLargePdf 
   {
       public static void main(String[] args) throws Exception
       {
           int fileCount = 2000;
           List<Closeable> toBeClosed = new ArrayList<Closeable>(fileCount);
           try 
               {
               PDFMergerUtility utility = new PDFMergerUtility();
               for (int i = 0; i < fileCount; i++) 
               {
                   FileInputStream fis = new FileInputStream(new 
File("~/exchange/source.pdf"));
                   toBeClosed.add(fis);
                   utility.addSource(fis);
               }
               utility.setDestinationFileName("target/combined.pdf");
               utility.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
           } 
           finally 
           {
               for (Closeable closeable : toBeClosed) 
               {
                   IOUtils.closeQuietly(closeable);
               }
           }
       }
   }
   ```
   Sample uses pdfbox from 2.0 branch and runs with VM options “-Xmx1G 
-XX:+HeapDumpOnOutOfMemoryError”.
   
   In the branch 2.0 this code gets an OutOfMemoryError. Processed only 1058 
source documents before OOM:
   
![image](https://user-images.githubusercontent.com/25397526/134963758-708437fc-0989-433a-b0d2-e507129c627b.png)
   
![image](https://user-images.githubusercontent.com/25397526/134963783-5c16ce04-7ead-476d-acef-a295a849901a.png)
    
   After fix this sample completed successfully. Processed all 2000 source 
files:
   
![image](https://user-images.githubusercontent.com/25397526/134963823-0d26d3bb-762f-4ef6-96f2-09361015f806.png)
   
![image](https://user-images.githubusercontent.com/25397526/134963848-1edab947-7102-4d99-8174-af41f71d8a24.png)
   
    
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to