Igor created PDFBOX-3171:
----------------------------

             Summary: Image extraction is slow
                 Key: PDFBOX-3171
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3171
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 2.0.0
            Reporter: Igor


Hello,

Image extraction in 2.0 is very slow. If I use 1.8.10:
> java -jar pdfbox-app-1.8.10.jar ExtractImages MANUAL000039763.pdf

It can extract all images in just 3-4 seconds.

If I use 2.0-rc2 or the latest snapshot:
> java -jar pdfbox-app-2.0.0-20151217.170042-1863.jar ExtractImages 
> MANUAL000039763.pdf

It takes 55-60 seconds to do the same. I profiled it with Visual VM and it 
showed that most of the time is spent on those two methods:

org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT()
java.awt.image.ColorConvertOp.filter() 42.9%  (22,578 msec)

org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.loadICCProfile()
java.awt.Color.<init>() 33.9% (19,444 msec)

Is there any way to make it faster? How does it work so fast in 1.8.10?

Sample file is attached.

Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to