[
https://issues.apache.org/jira/browse/PDFBOX-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960201#comment-13960201
]
Tilman Hausherr edited comment on PDFBOX-2007 at 4/4/14 8:44 PM:
-----------------------------------------------------------------
Yeah, 2.0 is really really slower. 1.8 renders the image almost immediately,
2.0 needs several seconds. I found two slow operations: "Do" (Invoke named
XObject, which is a DCT image here), and "Tf" (select font). I can't comment on
Tf but I can comment on Do. It uses the DCTFilter. -The old version just used
the JPXFilter.- Both use java ImageIO, the old version used a "dumb" method,
the new version supports CMYK, YCbCr and YCCK jpegs and uses all sort of
workarounds to bypass java bugs.
Anyway, I found one piece of code I could optimize: fromBGRtoRGB(), done in rev
1584837 for the trunk. I'm handling one pixel line instead of just one pixel.
That part takes 30% of the previous time (for that method). Funny thing is that
handling the whole image at a time makes it slower (60% instead of 30%). This
still won't be as fast as 1.8, but it is a small start: rendering page 1 now
takes 4 secs instead of 5 secs on my pc. Luckily, there are not many files with
DCT encoding.
About the PDF - do you mind if I reupload it "for committers only" after you
deleted it? This is just so that it is part of the record.
was (Author: tilman):
Yeah, 2.0 is really really slower. 1.8 renders the image almost immediately,
2.0 needs several seconds. I found two slow operations: "Do" (Invoke named
XObject, which is a DCT image here), and "Tf" (select font). I can't comment on
Tf but I can comment on Do. It uses the DCTFilter. The old version just used
the JPXFilter. Both use java ImageIO, the old version used a "dumb" method, the
new version supports CMYK, YCbCr and YCCK jpegs and uses all sort of
workarounds to bypass java bugs.
Anyway, I found one piece of code I could optimize: fromBGRtoRGB(), done in rev
1584837 for the trunk. I'm handling one pixel line instead of just one pixel.
That part takes 30% of the previous time (for that method). Funny thing is that
handling the whole image at a time makes it slower (60% instead of 30%). This
still won't be as fast as 1.8, but it is a small start: rendering page 1 now
takes 4 secs instead of 5 secs on my pc. Luckily, there are not many files with
DCT encoding.
About the PDF - do you mind if I reupload it "for committers only" after you
deleted it? This is just so that it is part of the record.
> Performance regression since PDFRenderer
> ----------------------------------------
>
> Key: PDFBOX-2007
> URL: https://issues.apache.org/jira/browse/PDFBOX-2007
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.0
> Reporter: François Bernier
> Labels: perfomance, regression
> Attachments: PDFBOX-2007.pdf
>
>
> Hi,
> I have the following toy project where I use PDFBox:
> https://github.com/fbernier/taz-clj
> I've been using the snapshot versions of PDFBox for quite a while and
> recently since the move from RenderUtil#convertToImage to
> PDFRenderer#renderImage (this commit:
> https://github.com/fbernier/taz-clj/commit/47917d494f2a9a0999da7f36827c45145d4bb42c),
> there is quite a big performance regression. If I change the PDFBox
> dependency to 1.8.x, everything is good. Here are my benchmarks:
> PDFBox 1.8.x:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
> 4 threads and 4 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 208.98ms 58.27ms 391.43ms 52.08%
> Req/Sec 4.63 1.73 8.00 62.88%
> 1224 requests in 1.00m, 72.34MB read
> Requests/sec: 20.40
> Transfer/sec: 1.21MB
> PDFBox 2.0.0:
> Running 1m test @ http://127.0.0.1:8080/testing.pdf?page=1
> 4 threads and 4 connections
> Thread Stats Avg Stdev Max +/- Stdev
> Latency 920.25ms 378.94ms 2.76s 91.38%
> Req/Sec 0.80 0.40 1.00 80.17%
> 275 requests in 1.00m, 15.85MB read
> Requests/sec: 4.58
> Transfer/sec: 270.41KB
> I have not looked any further than this and have no more data to give you
> (yet).
--
This message was sent by Atlassian JIRA
(v6.2#6252)