[ https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmeran Seehuber updated PDFBOX-4341: ------------------------------------- Attachment: pngconvert_v2.patch > [Patch] PNGConverter: PNG bytes to PDImageXObject converter > ----------------------------------------------------------- > > Key: PDFBOX-4341 > URL: https://issues.apache.org/jira/browse/PDFBOX-4341 > Project: PDFBox > Issue Type: Improvement > Components: Writing > Affects Versions: 2.0.12 > Reporter: Emmeran Seehuber > Priority: Minor > Attachments: pngconvert_testimg.zip, pngconvert_v1.patch, > pngconvert_v2.patch > > > The attached patch implements a PNG bytes to PDImageXObject converter. It > tries to create a PDImageXObject from the chunks of a PNG image, without > recompressing it. This allows to use programs like pngcrush and friends to > embedded optimal compressed images. It’s also way faster than recompressing > the image. > The class PNGConverter does this in three steps: > - Parsing the PNG chunk structure from the byte array > - Validating all relevant data chunks (i.e. checking the CRC). Chunks which > are not needed (e.g. text chunks) are not validated. > - Constructing a PDImageXObject from the chunks > When at any of this steps an error occurs or the converter detects that it is > not possible to map the image, it will bail out and return null. In this case > the image has to be embedded the „normal“ way by reading it using ImageIO and > compressing it again. > Only this PNG image types can be converted (at least theoretically) without > recompressing the image data: > - Grayscale > - Truecolor (i.e. RGB 8-Bit/16-Bit) > - Indexed > As soon as transparency is used it gets difficult: > - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in > the image data stream, as they are stored as (Gray,Alpha) or > (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for > the SMASK-Image. At this moment you can just read and recompress it using the > LosslessFactory. > - Indexed with alpha. Alpha and color tables are separate in the PNG, so > this should be possible to build a grayscale SMASK from the image data (which > are just the table indices) and the alpha table. Tried that, but Acrobat > Reader does not like indexed SMASKs… One could just build a grayscale SMASK > using the alpha table and the decompressed image index data. This would at > least save some space, as the optimized indexed image data is still used. > With the current patch only truecolor without alpha images work correctly. > The other tests for grayscale and indexed fail. (You must place the zipped > images in the resources folder were png.png resides to run the testdrivers; > This images are „original“ work done by me using Gimp, Krita and ImageOptim > (on macOS) to build the different png image types.) > Notes for the current patch: > - The grayscale images have the wrong gamma curve. I tried using the > ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off > (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the > image is tagged with a CalGray profile, but the colors are way more off then. > - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s > from the PDF spec to convert the cRHM values to the CalRGB whitepoint and > matrix. I have not yet tested this, as I have no test image with cHRM at the > moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric > matrices. But this methods are wrong for any other kind of matrix (i.e. color > transform matrices), as they only store/restore 6 values of the 3x3 matrix. I > deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never > working and can not work as long as the Matrix class is for geometric use > cases only. This should also be documented on the Matrix class, that it is > not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow > to set the matrix. > - The indexed image displays fine in Acrobat Reader, but the test driver > fails as PDImageXObject.getImage() returns a complete black (everything 0) > image. Strange, I suspect some error in the PDFBox image decoding. > - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is > attached. Theoretically you can use a CalRGB colorspace, but using a ICC > color profile is likely faster (at least in PDFBox) and more „standard“. > You can also look at this patch on GitHub > [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1] > if you like. > It would be nice if someone could give me some hints with the colorspace > problems. I will try to reread the specs again, maybe I have missed > something. But it would be great if someone else who has an idea about > colorspaces could also take a look into this. > As I have no idea how long it takes to understand why the colors are off for > grayscale and wrong for indexed, I could prepare a stripped down version of > this patch, which only contains the working stuff (i.e. truecolor), and would > just do nothing on the not working cases. What do you think? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org