Antonio Contreras created PDFBOX-4201:
-----------------------------------------

             Summary: Certain scanned pdfs do not render
                 Key: PDFBOX-4201
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4201
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 2.0.8
            Reporter: Antonio Contreras
         Attachments: testDoc2.pdf

I am using PDFBox version 2.0.8. I am trying to render scanned pdfs but there 
are a some that do not render and result in an error.  Native pdfs do not have 
any trouble rendering. The majority of the scanned pdfs that I have also do not 
have any trouble rendering but there are a couple that result in an error (one 
is attached).

This is the code I used to render the pdf.

 
{code:java}
try (PDDocument document = load(file)) {
    logger.debug("start generate image file " + pageNumber + " for " + name);
    PDFRenderer pdfRenderer = new PDFRenderer(document);
    return getPageImage(pdfRenderer, pageNumber, name, storageId);
}{code}
The above call to getPageImage calls the following code 

 
{code:java}
File imageFile = File.createTempFile(StringUtils.toFilename(storageId) + "_" + 
pageNumber, ".png");
imageFile.deleteOnExit();

final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, 
ImageType.RGB);
ImageIO.write(image, "png", imageFile);

logger.debug("completed generate image file " + pageNumber + " for " + name);
return imageFile;{code}
The issue occurs in the second code snippet in the line

 
{code:java}
final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, 
ImageType.RGB);{code}
 

The stack trace is the following

 
{code:java}
Caused by: java.io.IOException: Error: Expected operator 'ID' actual='In'

at 
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:305)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:502)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
 ~[pdfbox-2.0.8.jar:2.0.8]

at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:203) 
~[pdfbox-2.0.8.jar:2.0.8]

at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) 
~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) 
~[pdfbox-2.0.8.jar:2.0.8]

at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:70) 
~[classes/:?]

at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:59) 
~[classes/:?]
{code}
Since rendering was not an issue with native pdfs I initially thought that only 
scanned pdfs were an issue. But after other scanned pdfs rendered, I am 
uncertain as to what could be causing some to render and some to error out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to