Yachun Miao created PDFBOX-3734:
-----------------------------------

             Summary: out of memory issue when convert scaned pdf to image
                 Key: PDFBOX-3734
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3734
             Project: PDFBox
          Issue Type: Bug
          Components: Rendering
    Affects Versions: 2.0.5
         Environment: win7 64bit, jdk 1.7 64bit
            Reporter: Yachun Miao


i had a scaned pdf file which just 2.8M, when try pdf to image feature, i get 
OOM with -Xmx200m:

{color:red}
        at java.awt.image.DataBufferByte.<init>(DataBufferByte.java:92)
        at 
java.awt.image.ComponentSampleModel.createDataBuffer(ComponentSampleModel.java:415)
        at 
sun.awt.image.ByteInterleavedRaster.<init>(ByteInterleavedRaster.java:89)
        at 
sun.awt.image.ByteInterleavedRaster.createCompatibleWritableRaster(ByteInterleavedRaster.java:1281)
        at 
sun.awt.image.ByteInterleavedRaster.createCompatibleWritableRaster(ByteInterleavedRaster.java:1292)
        at org.apache.pdfbox.filter.DCTFilter.fromBGRtoRGB(DCTFilter.java:246)
        at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:171)
        at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
        at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
        at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
        at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.<init>(PDImageXObject.java:124)
        at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
        at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
        at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
        at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:206)
        at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145)
{color}

After i enlarge jvm max heap size to 500M, then it works. 

I know pdf rendering is very difficulty, but do we have some manner to avoid 
consumpting so much memory? whatever it is a bit surprized pdfbox use 500M 
memory to handle one page of scaned pdf (total 2.8M). ratio is around 200 
times. 

But as per me, it is ok to decrease some quality of image converted. (actually 
the quality of original image in pdf not good as well. :)). Tell me if we do 
have such methods. I will help try. 






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to