Timo Boehme created PDFBOX-2883:
-----------------------------------
Summary: Unify memory handling
Key: PDFBOX-2883
URL: https://issues.apache.org/jira/browse/PDFBOX-2883
Project: PDFBox
Issue Type: Improvement
Components: Parsing
Affects Versions: 2.0.0
Reporter: Timo Boehme
Assignee: Timo Boehme
PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping
large data in temporary file: in case of provided input stream the stream is
copied to temporary file and all read PDF streams are handled by
RandomAccessBuffer/ScratchFile.
In PDFBOX-2882 I've done a re-implementation for ScratchFile which is quite
fast and allows to set a maximum amount of memory to be used for its pages
before it starts using the scratch file. This implementation could be used as
the general 'backend' for all buffered streams and even the file input stream
copy. As long as the PDF fits into the allowed maximum memory it should equally
fast as RandomAccessBuffer while it allows for good control of memory usage by
going to scratch file if needed. This prevents OOM in case of large files.
In order to use this the PDDocument methods should be changed to not have a
'useScratchFile' parameter but to take a MemoryHandling object which details
the Buffering strategy (using ScratchFile; what amount of main memory can be
used, ...).
I've opened this issue for discussing. Since we need API changes in PDDocument
it should be done before 2.0 release.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]