Justin Lee created PDFBOX-3380:
----------------------------------

             Summary: Small change to PDFSplit loop reduces memory consuption
                 Key: PDFBOX-3380
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3380
             Project: PDFBox
          Issue Type: Improvement
          Components: Utilities
    Affects Versions: 2.0.2
            Reporter: Justin Lee
            Priority: Minor


I was trying to use PDFSplit to split a large scanned document into single 
pages.  It very quickly ran out of memory.  I poked around in the code, and it 
looks to me like the issue is that the splitter code tries to create an 
in-memory model of every single cloned page before writing them to disk.  I 
created a patch based off of 2.0.2 that fixes my immediate problem in case it 
is helpful to anybody.  All it really does is move the outer processing loop to 
PDFSplit so it can write to disk after each page.  This probably isn't an ideal 
fix, but I'm not familiar with the internals of PDFBox to do much more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to