[jira] [Commented] (PDFBOX-2883) Unify memory handling

Timo Boehme (JIRA) Tue, 29 Sep 2015 01:02:25 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934817#comment-14934817
 ]


Timo Boehme commented on PDFBOX-2883:
-------------------------------------

Please see my comment here: 
https://issues.apache.org/jira/browse/PDFBOX-2882?focusedCommentId=14628303&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14628303
While it wasn't a thorough testing at least there was no noticeable slow down. 
You can check it yourself using a version before revision 1705657. For testing 
in-memory processing performance load a document via the (now removed) 
{{PDDocument.load( ... boolean useScratchFile )}} and via new 
{{PDDocument.load( MemoryUsageSetting.setupMainMemoryOnly() )}}.
If you really find a considerable slow down we can add a fast path in 
ScratchFile handling the specific case of unrestricted main memory usage. 

> Unify memory handling
> ---------------------
>
>                 Key: PDFBOX-2883
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2883
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>             Fix For: 2.0.0
>
>         Attachments: MemoryUsage.java
>
>
> PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping 
> large data in temporary file: in case of provided input stream the stream is 
> copied to temporary file and all read PDF streams are handled by 
> RandomAccessBuffer/ScratchFile.
> In PDFBOX-2882 I've done a re-implementation for ScratchFile which is quite 
> fast and allows to set a maximum amount of memory to be used for its pages 
> before it starts using the scratch file. This implementation could be used as 
> the general 'backend' for all buffered streams and even the file input stream 
> copy. As long as the PDF fits into the allowed maximum memory it should 
> equally fast as RandomAccessBuffer while it allows for good control of memory 
> usage by going to scratch file if needed. This prevents OOM in case of large 
> files.
> In order to use this the PDDocument methods should be changed to not have a 
> 'useScratchFile' parameter but to take a MemoryHandling object which details 
> the Buffering strategy (using ScratchFile; what amount of main memory can be 
> used, ...).
> I've opened this issue for discussing. Since we need API changes in 
> PDDocument it should be done before 2.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-2883) Unify memory handling

Reply via email to