[
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628556#comment-14628556
]
John Hewson commented on PDFBOX-2882:
-------------------------------------
{quote}
I've downloaded testPDF_childAttachments.pdf from PDFBOX-2856 and run
PDDocument.load(File,useScratchFile) on it and queried page count.
{quote}
Is getting the page count the only thing you're benchmarking? If so, it's not a
representative use case. Try processing a multipage file with many streams,
e.g. extracting text on a large file, or better yet rendering a file with lots
of images.
You're seeing an improvement in your benchmark over subsequent runs because the
OS is caching the PDF file too, so the later measurements are not
representative either but they are still useful for comparison across the
different implementations (as each implementation is subject to the same
effect).
> Improve performance when using scratch file
> -------------------------------------------
>
> Key: PDFBOX-2882
> URL: https://issues.apache.org/jira/browse/PDFBOX-2882
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Timo Boehme
> Assignee: Timo Boehme
> Priority: Minor
> Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which
> slows down parsing compared with in-memory scratch buffer considerably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]