[ 
https://issues.apache.org/jira/browse/PDFBOX-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960201#comment-13960201
 ] 

Tilman Hausherr commented on PDFBOX-2007:
-----------------------------------------

Yeah, 2.0 is really really slower. 1.8 renders the image almost immediately, 
2.0 needs several seconds. I found two slow operations: "Do" (Invoke named 
XObject, which is a DCT image here), and "Tf" (select font). I can't comment on 
Tf but I can comment on Do. It uses the DCTFilter. The old version just used 
the JPXFilter. Both use java ImageIO, the old version used a "dumb" method, the 
new version supports CMYK, YCbCr and YCCK jpegs and uses all sort of 
workarounds to bypass java bugs.

Anyway, I found one piece of code I could optimize: fromBGRtoRGB(), done in rev 
1584837 for the trunk. I'm handling one pixel line instead of just one pixel. 
That part takes 30% of the previous time (for that method). Funny thing is that 
handling the whole image at a time makes it slower (60% instead of 30%). This 
still won't be as fast as 1.8, but it is a small start: rendering page 1 now 
takes 4 secs instead of 5 secs on my pc. Luckily, there are not many files with 
DCT encoding.

About the PDF - do you mind if I reupload it "for committers only" after you 
deleted it? This is just so that it is part of the record.

> Performance regression since PDFRenderer
> ----------------------------------------
>
>                 Key: PDFBOX-2007
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2007
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.0
>            Reporter: François Bernier
>              Labels: perfomance, regression
>         Attachments: testing.pdf
>
>
> Hi,
> I have the following toy project where I use PDFBox: 
> https://github.com/fbernier/taz-clj
> I've been using the snapshot versions of PDFBox for quite a while and 
> recently since the move from RenderUtil#convertToImage to 
> PDFRenderer#renderImage (this commit: 
> https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
>  there is quite a big performance regression. If I change the PDFBox 
> dependency to 1.8.x, everything is good. Here are my benchmarks:
> PDFBox 1.8.x:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency   208.98ms   58.27ms 391.43ms   52.08%
>     Req/Sec     4.63      1.73     8.00     62.88%
>   1224 requests in 1.00m, 72.34MB read
> Requests/sec:     20.40
> Transfer/sec:      1.21MB
> PDFBox 2.0.0:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
>   4 threads and 4 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency   920.25ms  378.94ms   2.76s    91.38%
>     Req/Sec     0.80      0.40     1.00     80.17%
>   275 requests in 1.00m, 15.85MB read
> Requests/sec:      4.58
> Transfer/sec:    270.41KB
> I have not looked any further than this and have no more data to give you 
> (yet).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to