[
https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694272#comment-16694272
]
Daniel Gredler commented on PDFBOX-4300:
----------------------------------------
I was going to create a new issue, but it looks like this may fit here...
I was looking at the {{LosslessFactory}} class today, thinking about using it
mainly with grayscale and bitonal images. Performance was worse than expected,
regardless of the compression level chosen
({{org.apache.pdfbox.filter.deflatelevel}}). Based on some local profiling and
using the default compression level, {{createFromGrayImage}} spends about 30%
of its time applying the flate filter, and the rest (70%) shuttling pixel data
around ({{getRGB}}, etc). It seems to me that this method should be able to
assume that the image's raster's data buffer is a {{DataBufferByte}}, and just
use the data buffer directly:
{code:java}
private static PDImageXObject createFromGrayImage(BufferedImage image,
PDDocument document)
throws IOException
{
byte[] pixels = ((DataBufferByte)
image.getRaster().getDataBuffer()).getData();
int bpc = image.getColorModel().getPixelSize();
return prepareImageXObject(document, pixels,
image.getWidth(), image.getHeight(), bpc,
PDDeviceGray.INSTANCE);
}
{code}
As expected, performance improved *drastically* with this change – to roughly
on par with PNG file creation using {{ImageIO.write}}. The output looks good to
the naked eye, but {{LosslessFactoryTest}} fails the grayscale assertion on
line 95, where things seem to be off by a very small amount:
{code:java}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.242 s <<<
FAILURE! - in org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest
testCreateLosslessFromImageRGB(org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest)
Time elapsed: 0.6 s <<< FAILURE!
junit.framework.AssertionFailedError: (3,0) expected: <FFFFFFFF> but was:
<FFFEFEFE>; expected:<-1> but was:<-65794>
at
org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest.testCreateLosslessFromImageRGB(LosslessFactoryTest.java:95)
{code}
Does this seem like a valid approach, both to improve performance and reduce
memory usage? If so, any idea why some of the pixels are slightly different
after the change?
> Reduce im memory buffers when creating grayscale images
> -------------------------------------------------------
>
> Key: PDFBOX-4300
> URL: https://issues.apache.org/jira/browse/PDFBOX-4300
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.11
> Reporter: Jesse Long
> Priority: Minor
> Labels: optimization
> Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data.
> First, it creates a BAOS in which to store the data, then a BAOS in which to
> store the flate encoded data. Finally the flate encoded data is written to
> the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the
> image data directly into the stream. We then instantiate a PDImageXObject
> giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]