Paulo R C Mello Junior created PDFBOX-1731:
----------------------------------------------
Summary: Converting pdf to Image
Key: PDFBOX-1731
URL: https://issues.apache.org/jira/browse/PDFBOX-1731
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 1.8.2
Environment: Windows 8 and Linux
JDK 1.7
Reporter: Paulo R C Mello Junior
Attachments: 2514690_5.pdf
I'm trying to convert a pdf page to image but an exception occurs:
17:28:20,652 ERROR [org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap]
(Thread-69) Something went wrong ... the pixelmap doesn't contain any data.
17:28:20,654 WARN [org.apache.pdfbox.util.operator.pagedrawer.Invoke]
(Thread-69) getRGBImage returned NULL
17:28:20,661 INFO [org.apache.pdfbox.util.PDFStreamEngine] (Thread-69)
unsupported/disabled operation: i
17:28:36,809 ERROR [stderr] (Thread-70) Exception in thread "Thread-70"
java.lang.OutOfMemoryError: Java heap space
17:28:36,811 ERROR [stderr] (Thread-70) at
java.awt.image.DataBufferByte.<init>(DataBufferByte.java:92)
17:28:36,812 ERROR [stderr] (Thread-70) at
java.awt.image.ComponentSampleModel.createDataBuffer(ComponentSampleModel.java:415)
17:28:36,814 ERROR [stderr] (Thread-70) at
java.awt.image.Raster.createWritableRaster(Raster.java:941)
17:28:36,814 ERROR [stderr] (Thread-70) at
javax.imageio.ImageTypeSpecifier.createBufferedImage(ImageTypeSpecifier.java:1073)
17:28:36,815 ERROR [stderr] (Thread-70) at
javax.imageio.ImageReader.getDestination(ImageReader.java:2896)
17:28:36,816 ERROR [stderr] (Thread-70) at
com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1066)
17:28:36,817 ERROR [stderr] (Thread-70) at
com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1034)
17:28:36,818 ERROR [stderr] (Thread-70) at
javax.imageio.ImageIO.read(ImageIO.java:1448)
17:28:36,818 ERROR [stderr] (Thread-70) at
javax.imageio.ImageIO.read(ImageIO.java:1352)
17:28:36,819 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.pdmodel.graphics.xobject.PDJpeg.getRGBImage(PDJpeg.java:264)
17:28:36,820 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:83)
17:28:36,821 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
17:28:36,823 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
17:28:36,824 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
17:28:36,825 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
17:28:36,826 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
17:28:36,827 ERROR [stderr] (Thread-70) at
org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:769)
My code:
public static List<BufferedImage> getPdfPagesAsImages(String pdfPath)
throws IOException {
File f = new File(pdfPath);
PDDocument pdfDocument = null;
pdfDocument = PDDocument.loadNonSeq(f, null);
List<BufferedImage> bImages = new ArrayList<BufferedImage>();
try {
System.out.println(pdfPath);
int resolution = 185;
if (pdfDocument != null) {
@SuppressWarnings("unchecked")
List<PDPage> pages = (List<PDPage>) pdfDocument
.getDocumentCatalog().getAllPages();
for (PDPage p : pages) {
BufferedImage convertedImage =
p.convertToImage(
BufferedImage.TYPE_INT_RGB, resolution);
if (isNegativeImage(convertedImage)) {
bImages.add(invertNegativeImage(convertedImage));
} else {
bImages.add(convertedImage);
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
e.getMessage();
e.getCause();
} catch (IOException e) {
e.printStackTrace();
e.getMessage();
e.getCause();
} catch (Exception e) {
e.printStackTrace();
e.getMessage();
e.getCause();
} finally {
pdfDocument.close();
}
return bImages;
}
I atached my pdf.
I have e10000 pdf to verify and about 30% throws this kind of exception
--
This message was sent by Atlassian JIRA
(v6.1#6144)