[
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628556#comment-14628556
]
John Hewson edited comment on PDFBOX-2882 at 7/15/15 7:08 PM:
--------------------------------------------------------------
{quote}
I've downloaded testPDF_childAttachments.pdf from PDFBOX-2856 and run
PDDocument.load(File,useScratchFile) on it and queried page count.
{quote}
Is getting the page count the only thing you're benchmarking? If so, it's not a
representative use case. Try processing a multipage file with many streams,
e.g. extracting text on a large file, or better yet rendering a file with lots
of images.
You're seeing an improvement in your benchmark over subsequent runs because the
OS is caching the PDF file too (assuming you're launching separate JVMs for
each run - if not then the JIT time is probably the large effect), so the later
measurements are not representative either but they are still useful for
comparison across the different implementations (as each implementation is
subject to the same effect).
was (Author: jahewson):
{quote}
I've downloaded testPDF_childAttachments.pdf from PDFBOX-2856 and run
PDDocument.load(File,useScratchFile) on it and queried page count.
{quote}
Is getting the page count the only thing you're benchmarking? If so, it's not a
representative use case. Try processing a multipage file with many streams,
e.g. extracting text on a large file, or better yet rendering a file with lots
of images.
You're seeing an improvement in your benchmark over subsequent runs because the
OS is caching the PDF file too, so the later measurements are not
representative either but they are still useful for comparison across the
different implementations (as each implementation is subject to the same
effect).
> Improve performance when using scratch file
> -------------------------------------------
>
> Key: PDFBOX-2882
> URL: https://issues.apache.org/jira/browse/PDFBOX-2882
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Timo Boehme
> Assignee: Timo Boehme
> Priority: Minor
> Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which
> slows down parsing compared with in-memory scratch buffer considerably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]