Hee Jeong Kim created PDFBOX-4598:
-------------------------------------

             Summary: oversized jbig2 decoded result that causing unnecessary 
operation
                 Key: PDFBOX-4598
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4598
             Project: PDFBox
          Issue Type: Bug
          Components: JBIG2
    Affects Versions: 3.0.2 JBIG2
            Reporter: Hee Jeong Kim
         Attachments: sample.pdf, use_packed_raster_to_read_Jbig2_image.patch

Hi! I am using pdfbox 2.0.16 and jbig2-imageio 3.0.2 to read JBIG2 images, and 
found some issue to report.

It seems like jbig2-imageio creates oversized BufferedImage, and this also 
makes pdfbox to do unnecessary operations.


To read Jbig2 image, pdfbox with jbig2-imageio do followings:

1. find JBIG2 ImageReader 
(https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L67)

2. read Image and get BufferedImage as a result 
(https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L106)

2-1. JBIG2 ImageIO 3.0.2 get decoded bitmap 
(https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/JBIG2ImageReader.java#L249)

2-2. return the given bitmap as buffered image 
(https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/JBIG2ImageReader.java#L259)


The problem is
At step 2-1, roughly 59MB Bitmap is created for given Jbig2 image on the second 
page of sample.pdf (which is correct),
but oversize(473MB, roughly) BufferedImage is returned at the step 2-2.

I think this is because jbig2-imageio uses a raster based on a 
PixelInterleavedSampleModel and IndexColorModel with 8 bits.
https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L177
https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L286
https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L291

This also makes pdfbox to check a pixel size of the color model of result 
buffered image,
https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L116

and to create another BufferedImage with binary type since it is not 1. (jbig2 
is 1-bit depth)
https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L122

I think we should call createPackedRaster and use the returned raster which is 
based on MultiPixelPackedSampleModel, and use IndexColorModel with 1 bits since 
jbig2 is for bi-level image. Please check the attached patch. I tested with the 
patch, and it seems like this patch works well.

You can reproduce this issue with the second of the sample.pdf file that I 
attached.
You can also download the file from here: 
http://www.newsgn.com/data/newsgn_com/pdf/201802/2018022229524590.pdf



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to