Timo Boehme created PDFBOX-4309:
-----------------------------------

             Summary: Performance regression in PDColorSpace#toRGBImageAWT Part 
2
                 Key: PDFBOX-4309
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4309
             Project: PDFBox
          Issue Type: Improvement
          Components: Rendering
    Affects Versions: 2.0.11, 3.0.0 PDFBox
            Reporter: Timo Boehme
            Assignee: Timo Boehme
         Attachments: PDICCBased.java.patch

This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
graphics produced by CorelDraw which are combined by more than 2500(!) images, 
each with its own indexed color space based on an ICC color space (the shadows 
of graphic objects are created by large number of gray lines ...). In our 
environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) rendering a 
single page with one graphic takes 780 seconds. The most time is spent in 
creating the indexed color space via ICC color space mapping:
{noformat}
   java.lang.Thread.State: RUNNABLE
        at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
        at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
        at sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
        - locked <0x0000000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
        at 
sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
        at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
        at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.<init>(PDIndexed.java:91)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
        at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
        at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
        at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
        at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
        at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
        at 
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
way to much time. Unfortunately using kcms via 
{{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no option 
as the Suse IceadTea OpenJDK seems to not have included it (anymore?) - in both 
Java 7 and Java 8.

However the ICC color space (PDICCBased) returns in this case CMYK as alternate 
color space and for CMYK we have the alternative rendering via system property 
org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from PDFBOX-3569.

The idea is now to have an option to force using the alternative color space 
instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
alternative color space it has to be combined with the system property 
'UsePureJavaCMYKConversion'.

Using this approach the rendering time of the page with the problematic graphic 
drops from 780 seconds to 1 second!

It is clear that using the alternate color space might return wrong/not exact 
colors. Therefore it should be only an option to enable this mode. However for 
processing large collections of PDF documents (e.g. focusing on text) or to 
display a PDF in a timely manner the performance improvement should outperform 
the drop in image quality.

While the provided patch will use the alternate color space if activated in any 
case, it could be possible at a later stage to add more intelligent logic which 
decides on a runtime analysis when to use this mode (number of calls to LCMS, 
time needed etc.).

If there are no objections with this patch I will apply it in the next days.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to