[
https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694351#comment-16694351
]
Emmeran Seehuber commented on PDFBOX-4300:
------------------------------------------
[~sdanig] The color mismatch is because the different gamma curve between sRGB
and whatever grayscale color profile the image had (color management fun ...).
The PDDeviceGray profile has a gamma curve depending on the output device, i.e.
this can really vary. PDFBox may just assume a sRGB gamma curve (but I didn't
look into).
To fix this you should just tag the image with the right profile, i.e. instead
of PDDeviceGray you build a PDICCBased profile, see the code in
PredictorEncoder#preparePredictorPDImage(), search for ICC_Profile.
But: The approach with directly casting the image-raster data buffer to
DataBufferByte is wrong. This works for your special case, but this won't work
in general because:
* You don't respect any strides the image data may have, i.e. there may be
trailing bytes every single image line.
* The buffer does not have to be a DataBufferByte. It can e.g. also be a
DataBufferShort for 16 bit images. Or it may be a memory mapped byte buffer
(see e.g.
[this|https://github.com/haraldk/TwelveMonkeys/blob/master/sandbox/sandbox-common/src/main/java/com/twelvemonkeys/image/MappedFileBuffer.java]
class, which is sadly not released on maven central, but I use a copy of it
very successful in production with huge images)
So the right solution would be to do it like the predictor encoder and use
{code:java}
image.getRaster().getDataElements(){code}
with the right array type. But you could also just simply try to extend the
PredictorEncoder to be also able to handle grayscale images.
> Reduce im memory buffers when creating grayscale images
> -------------------------------------------------------
>
> Key: PDFBOX-4300
> URL: https://issues.apache.org/jira/browse/PDFBOX-4300
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.11
> Reporter: Jesse Long
> Priority: Minor
> Labels: optimization
> Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data.
> First, it creates a BAOS in which to store the data, then a BAOS in which to
> store the flate encoded data. Finally the flate encoded data is written to
> the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the
> image data directly into the stream. We then instantiate a PDImageXObject
> giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]