Antonio Contreras created PDFBOX-4201:
-----------------------------------------
Summary: Certain scanned pdfs do not render
Key: PDFBOX-4201
URL: https://issues.apache.org/jira/browse/PDFBOX-4201
Project: PDFBox
Issue Type: Bug
Affects Versions: 2.0.8
Reporter: Antonio Contreras
Attachments: testDoc2.pdf
I am using PDFBox version 2.0.8. I am trying to render scanned pdfs but there
are a some that do not render and result in an error. Native pdfs do not have
any trouble rendering. The majority of the scanned pdfs that I have also do not
have any trouble rendering but there are a couple that result in an error (one
is attached).
This is the code I used to render the pdf.
{code:java}
try (PDDocument document = load(file)) {
logger.debug("start generate image file " + pageNumber + " for " + name);
PDFRenderer pdfRenderer = new PDFRenderer(document);
return getPageImage(pdfRenderer, pageNumber, name, storageId);
}{code}
The above call to getPageImage calls the following code
{code:java}
File imageFile = File.createTempFile(StringUtils.toFilename(storageId) + "_" +
pageNumber, ".png");
imageFile.deleteOnExit();
final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi,
ImageType.RGB);
ImageIO.write(image, "png", imageFile);
logger.debug("completed generate image file " + pageNumber + " for " + name);
return imageFile;{code}
The issue occurs in the second code snippet in the line
{code:java}
final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi,
ImageType.RGB);{code}
The stack trace is the following
{code:java}
Caused by: java.io.IOException: Error: Expected operator 'ID' actual='In'
at
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:305)
~[pdfbox-2.0.8.jar:2.0.8]
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:502)
~[pdfbox-2.0.8.jar:2.0.8]
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
~[pdfbox-2.0.8.jar:2.0.8]
at
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:203)
~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145)
~[pdfbox-2.0.8.jar:2.0.8]
at
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
~[pdfbox-2.0.8.jar:2.0.8]
at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:70)
~[classes/:?]
at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:59)
~[classes/:?]
{code}
Since rendering was not an issue with native pdfs I initially thought that only
scanned pdfs were an issue. But after other scanned pdfs rendered, I am
uncertain as to what could be causing some to render and some to error out.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]