[
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628303#comment-14628303
]
Timo Boehme commented on PDFBOX-2882:
-------------------------------------
I'm not quite sure how/what you are measuring. I've downloaded
testPDF_childAttachments.pdf from PDFBOX-2856 and run
PDDocument.load(File,useScratchFile) on it and queried page count. I've run
this 10 times within same VM using the new scratch file implementation without
main-memory, with 50MB of main-memory and without scratch file and got (all
times in ms):
||scratch-file,no-mem||scratch-file,50MB||no-scratch-file||
|1432|1381|1248|
|630|522|775|
|897|482|789|
|757|579|610|
|575|472|751|
So interestingly the pure scratch-file is nearly equally good as
no-scratch-file while the memory-added scratch file implementation is even
faster - currently not sure why.
Did you load via InputStream?
> Improve performance when using scratch file
> -------------------------------------------
>
> Key: PDFBOX-2882
> URL: https://issues.apache.org/jira/browse/PDFBOX-2882
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Timo Boehme
> Assignee: Timo Boehme
> Priority: Minor
> Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which
> slows down parsing compared with in-memory scratch buffer considerably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]