[ 
https://issues.apache.org/jira/browse/PDFBOX-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621835#comment-14621835
 ] 

Andreas Lehmkühler commented on PDFBOX-2856:
--------------------------------------------

{quote}
When we load a file with:
pdfDocument = PDDocument.load(stream, password, true);

It takes 2.5 minutes to process.

When we load a file with:
pdfDocument = PDDocument.load(stream, password);
It takes 5 seconds (= to performance with 1.8.9).
{quote}
The scratch file was rewritten. The old approach doesn't support concurrent 
access and I'm not talking about multithreading. When creating a pdf from 
scratch one has to follow a special order when adding text/fonts/images to a 
stream otherwise the whole stream is messed up. The new scratch file now uses 
paged access. Maybe that is the culprit here.

Another point is maybe the used parser. Do you use the sequential or the 
non-sequential parser wiht 1.8.9?

> Markedly slower processing for particular file in 2.0.0-trunk vs 1.8.9
> ----------------------------------------------------------------------
>
>                 Key: PDFBOX-2856
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2856
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Assignee: John Hewson
>            Priority: Minor
>         Attachments: batch-process-error.zip, testPDF_childAttachments.pdf
>
>
> As part of TIKA-1285, we noticed that the attached file is taking quite a bit 
> longer with PDFBox 2.0.0 trunk than with 1.8.9.
> [~lehmi] confirmed 4-5x slower.
> Not sure what the cause is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to