[jira] [Created] (PDFBOX-5531) wrong image data is extracted from PDF having single image

Komal (Jira) Thu, 20 Oct 2022 08:08:07 -0700

Komal created PDFBOX-5531:
-----------------------------

             Summary: wrong image data is extracted from PDF having single image
                 Key: PDFBOX-5531
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5531
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 2.0.26
            Reporter: Komal



Dear Concerned,

We are trying to extract image from PDF having single image with following 
properties: CCITTFaxDecode decoded G4 compression, 150 dpi but when following 
code of PDFBox is used than we get LZW image with 96 dpi

  PDDocument document = PDDocument.load(new 
File("D:\\extractImage\\in\\20211125174048BT Exception Documents.pdf"));
        PDPageTree list = document.getPages();
        for (PDPage page : list) {
            PDResources pdResources = page.getResources();
            for (COSName c : pdResources.getXObjectNames()) {
                PDXObject o = pdResources.getXObject(c);
                if (o instanceof 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) {

 BufferedImage img=   
((org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject)o).getImage();

}

}

}

 

Also we we try to get raw stream byte data of image using following method , 
the byte array coming is incorrect.

PDPage page1 = reader.getPage(pageNumber-1);
            PDResources pdResources = page1.getResources();
            for (COSName c : pdResources.getXObjectNames()) {
                PDXObject o = pdResources.getXObject(c);
                PDImageXObject ob = (PDImageXObject)o;
            ImageXObject xObj1 = new ImageXObject();
            xObj1.xObject = (PDImageXObject) o;

COSStream imageStream = ob.getCOSObject();

            PDStream stream = (new PDStream(imageStream));
        //    BufferedImage image = ob.getImage();
            byte[] streamDataBuffer = stream.toByteArray(); 

 

kindly provide a method which can return black and white image object and image 
raw stream byte array.

Thanks in advance.

Regards,

Komal Walia



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (PDFBOX-5531) wrong image data is extracted from PDF having single image

Reply via email to