Justin Lee created PDFBOX-3380:
----------------------------------
Summary: Small change to PDFSplit loop reduces memory consuption
Key: PDFBOX-3380
URL: https://issues.apache.org/jira/browse/PDFBOX-3380
Project: PDFBox
Issue Type: Improvement
Components: Utilities
Affects Versions: 2.0.2
Reporter: Justin Lee
Priority: Minor
I was trying to use PDFSplit to split a large scanned document into single
pages. It very quickly ran out of memory. I poked around in the code, and it
looks to me like the issue is that the splitter code tries to create an
in-memory model of every single cloned page before writing them to disk. I
created a patch based off of 2.0.2 that fixes my immediate problem in case it
is helpful to anybody. All it really does is move the outer processing loop to
PDFSplit so it can write to disk after each page. This probably isn't an ideal
fix, but I'm not familiar with the internals of PDFBox to do much more.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]