[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

Emmeran Seehuber (Jira) Tue, 26 May 2020 14:27:10 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117050#comment-17117050
 ]


Emmeran Seehuber commented on PDFBOX-4847:
------------------------------------------

The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
         if (state.iCCP != null || state.sRGB != null)
         {
             // We have got a color profile, which we must attach
             cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
             cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
                     == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
             if (state.iCCP != null)
             {
+                cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
                 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
                         break;
                     iccProfileDataStart++;
                 }
+                iccProfileDataStart++;
                 if (iccProfileDataStart >= state.iCCP.length)
                 {
                     LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has 
the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without 
the fix for this in PNGConverterTest the colors will be "miles" off when the 
PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the, 
was there are some color rounding differences (off by 1 on the first pixel, for 
whatever reason, likely some different color conversion paths somewhere). There 
is a massive difference between converting single pixel values between 
colorspaces and converting a whole image at once (using ColorConversionOp). The 
later one may choose slightly different colors depending on the rendering 
intent and the colors in use in the image. The image from PDImage.getImage() 
would have been ColorConversionOp-converted, but in checkIdent() using getRGB() 
the image read with ImageIO would be "pixel by pixel" color converted. One 
could fix this by first converting the expected image using ColorConversionOp 
to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the 
test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert 
non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if 
you like.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> ------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4847
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4847
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel, Writing
>    Affects Versions: 2.0.19
>            Reporter: Emmeran Seehuber
>            Priority: Minor
>              Labels: feature, patch
>         Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

Reply via email to