[jira] [Comment Edited] (PDFBOX-2883) Unify memory handling

JIRA Wed, 15 Jul 2015 04:17:26 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627902#comment-14627902
 ]


Andreas Lehmkühler edited comment on PDFBOX-2883 at 7/15/15 11:16 AM:
----------------------------------------------------------------------

Just before starting the discussion I'd like to clarify one point:
{quote}
PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping 
large data in temporary file: in case of provided input stream the stream is 
copied to temporary file and all read PDF streams are handled by 
RandomAccessBuffer/ScratchFile.
{quote}
That's not correct. After streamlining the whole pdf source handling there is 
only one mechanism left. The user has to decide wether to use memory or scratch 
file based buffering. That is true for internal buffers as well as for the copy 
of the input stream.


was (Author: lehmi):
Just before starting the discussion I'd like to clarify one point:
{quote}
PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping 
large data in temporary file: in case of provided input stream the stream is 
copied to temporary file and all read PDF streams are handled by 
RandomAccessBuffer/ScratchFile.
{quote}
That's not correct. After streamlining the whole fpdf source handling there is 
only one mechanism left. The user has to decide wether to use memory or scratch 
file based buffering. That is true for internal buffers as well as for the copy 
of the input stream.

> Unify memory handling
> ---------------------
>
>                 Key: PDFBOX-2883
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2883
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>
> PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping 
> large data in temporary file: in case of provided input stream the stream is 
> copied to temporary file and all read PDF streams are handled by 
> RandomAccessBuffer/ScratchFile.
> In PDFBOX-2882 I've done a re-implementation for ScratchFile which is quite 
> fast and allows to set a maximum amount of memory to be used for its pages 
> before it starts using the scratch file. This implementation could be used as 
> the general 'backend' for all buffered streams and even the file input stream 
> copy. As long as the PDF fits into the allowed maximum memory it should 
> equally fast as RandomAccessBuffer while it allows for good control of memory 
> usage by going to scratch file if needed. This prevents OOM in case of large 
> files.
> In order to use this the PDDocument methods should be changed to not have a 
> 'useScratchFile' parameter but to take a MemoryHandling object which details 
> the Buffering strategy (using ScratchFile; what amount of main memory can be 
> used, ...).
> I've opened this issue for discussing. Since we need API changes in 
> PDDocument it should be done before 2.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-2883) Unify memory handling

Reply via email to