[ https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694351#comment-16694351 ]
Emmeran Seehuber commented on PDFBOX-4300: ------------------------------------------ [~sdanig] The color mismatch is because the different gamma curve between sRGB and whatever grayscale color profile the image had (color management fun ...). The PDDeviceGray profile has a gamma curve depending on the output device, i.e. this can really vary. PDFBox may just assume a sRGB gamma curve (but I didn't look into). To fix this you should just tag the image with the right profile, i.e. instead of PDDeviceGray you build a PDICCBased profile, see the code in PredictorEncoder#preparePredictorPDImage(), search for ICC_Profile. But: The approach with directly casting the image-raster data buffer to DataBufferByte is wrong. This works for your special case, but this won't work in general because: * You don't respect any strides the image data may have, i.e. there may be trailing bytes every single image line. * The buffer does not have to be a DataBufferByte. It can e.g. also be a DataBufferShort for 16 bit images. Or it may be a memory mapped byte buffer (see e.g. [this|https://github.com/haraldk/TwelveMonkeys/blob/master/sandbox/sandbox-common/src/main/java/com/twelvemonkeys/image/MappedFileBuffer.java] class, which is sadly not released on maven central, but I use a copy of it very successful in production with huge images) So the right solution would be to do it like the predictor encoder and use {code:java} image.getRaster().getDataElements(){code} with the right array type. But you could also just simply try to extend the PredictorEncoder to be also able to handle grayscale images. > Reduce im memory buffers when creating grayscale images > ------------------------------------------------------- > > Key: PDFBOX-4300 > URL: https://issues.apache.org/jira/browse/PDFBOX-4300 > Project: PDFBox > Issue Type: Improvement > Components: PDModel > Affects Versions: 2.0.11 > Reporter: Jesse Long > Priority: Minor > Labels: optimization > Attachments: PDFBOX-4300-1.patch > > > LosslessFactory uses ByteArrayOutputStreams when creating PDF image data. > First, it creates a BAOS in which to store the data, then a BAOS in which to > store the flate encoded data. Finally the flate encoded data is written to > the PDImageXObject's stream. > We could instead create an empty PDStream, give it a filter, and write the > image data directly into the stream. We then instantiate a PDImageXObject > giving it the already created stream. > This would dramatically reduce RAM requirement if a scratchfile is in play. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org