Timo Boehme created PDFBOX-2883:
-----------------------------------

             Summary: Unify memory handling
                 Key: PDFBOX-2883
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2883
             Project: PDFBox
          Issue Type: Improvement
          Components: Parsing
    Affects Versions: 2.0.0
            Reporter: Timo Boehme
            Assignee: Timo Boehme


PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping 
large data in temporary file: in case of provided input stream the stream is 
copied to temporary file and all read PDF streams are handled by 
RandomAccessBuffer/ScratchFile.

In PDFBOX-2882 I've done a re-implementation for ScratchFile which is quite 
fast and allows to set a maximum amount of memory to be used for its pages 
before it starts using the scratch file. This implementation could be used as 
the general 'backend' for all buffered streams and even the file input stream 
copy. As long as the PDF fits into the allowed maximum memory it should equally 
fast as RandomAccessBuffer while it allows for good control of memory usage by 
going to scratch file if needed. This prevents OOM in case of large files.

In order to use this the PDDocument methods should be changed to not have a 
'useScratchFile' parameter but to take a MemoryHandling object which details 
the Buffering strategy (using ScratchFile; what amount of main memory can be 
used, ...).

I've opened this issue for discussing. Since we need API changes in PDDocument 
it should be done before 2.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to