[ 
https://issues.apache.org/jira/browse/PDFBOX-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628556#comment-14628556
 ] 

John Hewson commented on PDFBOX-2882:
-------------------------------------

{quote}
I've downloaded testPDF_childAttachments.pdf from PDFBOX-2856 and run 
PDDocument.load(File,useScratchFile) on it and queried page count. 
{quote}

Is getting the page count the only thing you're benchmarking? If so, it's not a 
representative use case. Try processing a multipage file with many streams, 
e.g. extracting text on a large file, or better yet rendering a file with lots 
of images.

You're seeing an improvement in your benchmark over subsequent runs because the 
OS is caching the PDF file too, so the later measurements are not 
representative either but they are still useful for comparison across the 
different implementations (as each implementation is subject to the same 
effect).

> Improve performance when using scratch file
> -------------------------------------------
>
>                 Key: PDFBOX-2882
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2882
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>            Priority: Minor
>         Attachments: ScratchFile.java, ScratchFileBuffer.java
>
>
> The current scratch file implementation uses many direct I/O calls which 
> slows down parsing compared with in-memory scratch buffer considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to