[
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-4341:
------------------------------------
Comment: was deleted
(was: Damn... just as I wanted to commit the rest, svn is down (server
migration).)
> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> -----------------------------------------------------------
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
> Issue Type: Improvement
> Components: Writing
> Affects Versions: 2.0.12
> Reporter: Emmeran Seehuber
> Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png,
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png,
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip,
> pngconvert_v1.patch, pngconvert_v2.patch, pngconvert_v3.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It
> tries to create a PDImageXObject from the chunks of a PNG image, without
> recompressing it. This allows to use programs like pngcrush and friends to
> embedded optimal compressed images. It’s also way faster than recompressing
> the image.
> The class PNGConverter does this in three steps:
> - Parsing the PNG chunk structure from the byte array
> - Validating all relevant data chunks (i.e. checking the CRC). Chunks which
> are not needed (e.g. text chunks) are not validated.
> - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is
> not possible to map the image, it will bail out and return null. In this case
> the image has to be embedded the „normal“ way by reading it using ImageIO and
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without
> recompressing the image data:
> - Grayscale
> - Truecolor (i.e. RGB 8-Bit/16-Bit)
> - Indexed
> As soon as transparency is used it gets difficult:
> - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in
> the image data stream, as they are stored as (Gray,Alpha) or
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for
> the SMASK-Image. At this moment you can just read and recompress it using the
> LosslessFactory.
> - Indexed with alpha. Alpha and color tables are separate in the PNG, so
> this should be possible to build a grayscale SMASK from the image data (which
> are just the table indices) and the alpha table. Tried that, but Acrobat
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK
> using the alpha table and the decompressed image index data. This would at
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly.
> The other tests for grayscale and indexed fail. (You must place the zipped
> images in the resources folder were png.png resides to run the testdrivers;
> This images are „original“ work done by me using Gimp, Krita and ImageOptim
> (on macOS) to build the different png image types.)
> Notes for the current patch:
> - The grayscale images have the wrong gamma curve. I tried using the
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the
> image is tagged with a CalGray profile, but the colors are way more off then.
> - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and
> matrix. I have not yet tested this, as I have no test image with cHRM at the
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric
> matrices. But this methods are wrong for any other kind of matrix (i.e. color
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never
> working and can not work as long as the Matrix class is for geometric use
> cases only. This should also be documented on the Matrix class, that it is
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow
> to set the matrix.
> - The indexed image displays fine in Acrobat Reader, but the test driver
> fails as PDImageXObject.getImage() returns a complete black (everything 0)
> image. Strange, I suspect some error in the PDFBox image decoding.
> - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
> if you like.
> It would be nice if someone could give me some hints with the colorspace
> problems. I will try to reread the specs again, maybe I have missed
> something. But it would be great if someone else who has an idea about
> colorspaces could also take a look into this.
> As I have no idea how long it takes to understand why the colors are off for
> grayscale and wrong for indexed, I could prepare a stripped down version of
> this patch, which only contains the working stuff (i.e. truecolor), and would
> just do nothing on the not working cases. What do you think?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]