Komal created PDFBOX-5531:
-----------------------------
Summary: wrong image data is extracted from PDF having single image
Key: PDFBOX-5531
URL: https://issues.apache.org/jira/browse/PDFBOX-5531
Project: PDFBox
Issue Type: Bug
Affects Versions: 2.0.26
Reporter: Komal
Dear Concerned,
We are trying to extract image from PDF having single image with following
properties: CCITTFaxDecode decoded G4 compression, 150 dpi but when following
code of PDFBox is used than we get LZW image with 96 dpi
PDDocument document = PDDocument.load(new
File("D:\\extractImage\\in\\20211125174048BT Exception Documents.pdf"));
PDPageTree list = document.getPages();
for (PDPage page : list) {
PDResources pdResources = page.getResources();
for (COSName c : pdResources.getXObjectNames()) {
PDXObject o = pdResources.getXObject(c);
if (o instanceof
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) {
BufferedImage img=
((org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject)o).getImage();
}
}
}
Also we we try to get raw stream byte data of image using following method ,
the byte array coming is incorrect.
PDPage page1 = reader.getPage(pageNumber-1);
PDResources pdResources = page1.getResources();
for (COSName c : pdResources.getXObjectNames()) {
PDXObject o = pdResources.getXObject(c);
PDImageXObject ob = (PDImageXObject)o;
ImageXObject xObj1 = new ImageXObject();
xObj1.xObject = (PDImageXObject) o;
COSStream imageStream = ob.getCOSObject();
PDStream stream = (new PDStream(imageStream));
// BufferedImage image = ob.getImage();
byte[] streamDataBuffer = stream.toByteArray();
kindly provide a method which can return black and white image object and image
raw stream byte array.
Thanks in advance.
Regards,
Komal Walia
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]