[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618743#comment-16618743 ]
Emmeran Seehuber commented on PDFBOX-4184: ------------------------------------------ [~tilman] If you have a ICC profile on an image, which is not the builtin sRGB profile, you need the ICC profile, otherwise you will just have plain wrong colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, but rather as vectors within the color space. Without a profile describing the vectorspace/colorspace you have no idea what real colors the vector values result in. DeviceRGB is (on screen) often interpreted as sRGB. But what DeviceCMYK means is really up to the concrete interpreting device. I.e. this will look different on every printer (brightness, color, ...). So DeviceCMYK as a colorspace for an image mostly means "random", if you are not explicit targeting one specific printer. The ICC profile describes how to transform the color-vector-data into other colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile of the printing device. If you load images in java using ImageIO you usually (especially when using twelve monkeys) get an sRGB image. So you would never hit this path. If you want to load an image with the real color profile of the image you must pass a special prepared (i.e. with the right profile) BufferedImage into ImageIO. So you wont get an image with an color space different to sRGB by accident. If you have a image with an ICC profile, you always want the in this colorspace with the attached profile. As its already not so easy to get the image in anything different than sRGB. Regarding file size bloat: Yes, the ICC profile will sum up, especially if you have more images. The correct solution would be a ICC_Profile <-> PDICCBased cache in the document, so that the same profile does not get encoded twice. Should I implement such a cache? In my application I manually deduplicate the ICC profiles at the moment. The attached patch [^fix_profile_use4.patch] fixes the test driver and also specifies a "Alternate" colorspace for the profile, for all those devices which can not handle ICC_Profile's. With the correct ICC_Profile specified now also the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be compared with the original image. > [PATCH]: Support simple lossless compression of 16 bit RGB images > ----------------------------------------------------------------- > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing > Affects Versions: 2.0.9 > Reporter: Emmeran Seehuber > Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org