[ 
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628303#comment-14628303
 ] 

Timo Boehme commented on PDFBOX-2882:
-------------------------------------

I'm not quite sure how/what you are measuring. I've downloaded 
testPDF_childAttachments.pdf from PDFBOX-2856 and run 
PDDocument.load(File,useScratchFile) on it and queried page count. I've run 
this 10 times within same VM using the new scratch file implementation without 
main-memory, with 50MB of main-memory and without scratch file and got (all 
times in ms):
||scratch-file,no-mem||scratch-file,50MB||no-scratch-file||
|1432|1381|1248|
|630|522|775|
|897|482|789|
|757|579|610|
|575|472|751|

So interestingly the pure scratch-file is nearly equally good as 
no-scratch-file while the memory-added scratch file implementation is even 
faster - currently not sure why.

Did you load via InputStream? 

> Improve performance when using scratch file
> -------------------------------------------
>
>                 Key: PDFBOX-2882
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2882
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>            Priority: Minor
>         Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which 
> slows down parsing compared with in-memory scratch buffer considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to