[ 
https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694272#comment-16694272
 ] 

Daniel Gredler commented on PDFBOX-4300:
----------------------------------------

I was going to create a new issue, but it looks like this may fit here...

I was looking at the {{LosslessFactory}} class today, thinking about using it 
mainly with grayscale and bitonal images. Performance was worse than expected, 
regardless of the compression level chosen 
({{org.apache.pdfbox.filter.deflatelevel}}). Based on some local profiling and 
using the default compression level, {{createFromGrayImage}} spends about 30% 
of its time applying the flate filter, and the rest (70%) shuttling pixel data 
around ({{getRGB}}, etc). It seems to me that this method should be able to 
assume that the image's raster's data buffer is a {{DataBufferByte}}, and just 
use the data buffer directly:
{code:java}
    private static PDImageXObject createFromGrayImage(BufferedImage image, 
PDDocument document)
            throws IOException
    {
        byte[] pixels = ((DataBufferByte) 
image.getRaster().getDataBuffer()).getData();
        int bpc = image.getColorModel().getPixelSize();
        return prepareImageXObject(document, pixels,
                image.getWidth(), image.getHeight(), bpc, 
PDDeviceGray.INSTANCE);
    }
{code}
As expected, performance improved *drastically* with this change – to roughly 
on par with PNG file creation using {{ImageIO.write}}. The output looks good to 
the naked eye, but {{LosslessFactoryTest}} fails the grayscale assertion on 
line 95, where things seem to be off by a very small amount:
{code:java}
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.242 s <<< 
FAILURE! - in org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest
testCreateLosslessFromImageRGB(org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest)
  Time elapsed: 0.6 s  <<< FAILURE!
junit.framework.AssertionFailedError: (3,0) expected: <FFFFFFFF> but was: 
<FFFEFEFE>;  expected:<-1> but was:<-65794>
        at 
org.apache.pdfbox.pdmodel.graphics.image.LosslessFactoryTest.testCreateLosslessFromImageRGB(LosslessFactoryTest.java:95)
{code}
Does this seem like a valid approach, both to improve performance and reduce 
memory usage? If so, any idea why some of the pixels are slightly different 
after the change?

> Reduce im memory buffers when creating grayscale images
> -------------------------------------------------------
>
>                 Key: PDFBOX-4300
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4300
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.11
>            Reporter: Jesse Long
>            Priority: Minor
>              Labels: optimization
>         Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data. 
> First, it creates a BAOS in which to store the data, then a BAOS in which to 
> store the flate encoded data. Finally the flate encoded data is written to 
> the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the 
> image data directly into the stream. We then instantiate a PDImageXObject 
> giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to