[jira] [Commented] (PDFBOX-4300) Reduce im memory buffers when creating grayscale images

Emmeran Seehuber (JIRA) Tue, 20 Nov 2018 23:50:24 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694351#comment-16694351
 ]


Emmeran Seehuber commented on PDFBOX-4300:
------------------------------------------

[~sdanig] The color mismatch is because the different gamma curve between sRGB 
and whatever grayscale color profile the image had (color management fun ...). 
The PDDeviceGray profile has a gamma curve depending on the output device, i.e. 
this can really vary. PDFBox may just assume a sRGB gamma curve (but I didn't 
look into).

To fix this you should just tag the image with the right profile, i.e. instead 
of PDDeviceGray you build a PDICCBased profile, see the code in 
PredictorEncoder#preparePredictorPDImage(), search for ICC_Profile.

But: The approach with directly casting the image-raster data buffer to 
DataBufferByte is wrong. This works for your special case, but this won't work 
in general because:
 * You don't respect any strides the image data may have, i.e. there may be 
trailing bytes every single image line.
 * The buffer does not have to be a DataBufferByte. It can e.g. also be a 
DataBufferShort for 16 bit images. Or it may be a memory mapped byte buffer 
(see e.g. 
[this|https://github.com/haraldk/TwelveMonkeys/blob/master/sandbox/sandbox-common/src/main/java/com/twelvemonkeys/image/MappedFileBuffer.java]
 class, which is sadly not released on maven central, but I use a copy of it 
very successful in production with huge images)

So the right solution would be to do it like the predictor encoder and use
{code:java}
image.getRaster().getDataElements(){code}
with the right array type.  But you could also just simply try to extend the 
PredictorEncoder to be also able to handle grayscale images. 

> Reduce im memory buffers when creating grayscale images
> -------------------------------------------------------
>
>                 Key: PDFBOX-4300
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4300
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.11
>            Reporter: Jesse Long
>            Priority: Minor
>              Labels: optimization
>         Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data. 
> First, it creates a BAOS in which to store the data, then a BAOS in which to 
> store the flate encoded data. Finally the flate encoded data is written to 
> the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the 
> image data directly into the stream. We then instantiate a PDImageXObject 
> giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4300) Reduce im memory buffers when creating grayscale images

Reply via email to