[ 
https://issues.apache.org/jira/browse/PDFBOX-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme updated PDFBOX-1175:
--------------------------------

    Attachment: BaseParser_readUntilEndStream.java

the optimized method (BaseParser#readUntilEndStream) for copying stream data 
from file to random buffer
                
> Stream parsing performance improvement + patch
> ----------------------------------------------
>
>                 Key: PDFBOX-1175
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1175
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 1.7.0
>            Reporter: Timo Boehme
>            Priority: Minor
>         Attachments: BaseParser_readUntilEndStream.java
>
>
> Stream parsing is one of the critical parts looked from a performance point 
> of view since typically most data is stored in streams. While PDFBOX already 
> got some speedup some time ago in the method copying stream data from file to 
> random access buffer (BaseParser#readUntilEndStream) there is some room for 
> improvement.
> The problem with the current implementation is the byte wise reading and 
> writing of the data. I have rewritten the method using byte arrays for IO and 
> optimized the number of needed comparisons for finding 'endstream'/'endobj'. 
> This results in 7-8 times faster parsing of streams and a 3-4 times faster 
> parsing of a normal 10 page PDF.
> See the attached file which is a drop in replacement for the 
> readUntilEndStream method in BaseParser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to