Igor created PDFBOX-3171:
----------------------------
Summary: Image extraction is slow
Key: PDFBOX-3171
URL: https://issues.apache.org/jira/browse/PDFBOX-3171
Project: PDFBox
Issue Type: Bug
Components: PDModel
Affects Versions: 2.0.0
Reporter: Igor
Hello,
Image extraction in 2.0 is very slow. If I use 1.8.10:
> java -jar pdfbox-app-1.8.10.jar ExtractImages MANUAL000039763.pdf
It can extract all images in just 3-4 seconds.
If I use 2.0-rc2 or the latest snapshot:
> java -jar pdfbox-app-2.0.0-20151217.170042-1863.jar ExtractImages
> MANUAL000039763.pdf
It takes 55-60 seconds to do the same. I profiled it with Visual VM and it
showed that most of the time is spent on those two methods:
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT()
java.awt.image.ColorConvertOp.filter() 42.9% (22,578 msec)
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.loadICCProfile()
java.awt.Color.<init>() 33.9% (19,444 msec)
Is there any way to make it faster? How does it work so fast in 1.8.10?
Sample file is attached.
Thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]