[ 
https://issues.apache.org/jira/browse/PDFBOX-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Contreras updated PDFBOX-4201:
--------------------------------------
    Description: 
I am using PDFBox version 2.0.8. I am trying to render scanned pdfs but there 
are a some that do not render and result in an error.  Native pdfs do not have 
any trouble rendering. The majority of the scanned pdfs that I have also do not 
have any trouble rendering but there are a couple that result in an error (one 
is attached).

This is the code I used to render the pdf.
{code:java}
try (PDDocument document = load(file)) {
    logger.debug("start generate image file " + pageNumber + " for " + name);
    PDFRenderer pdfRenderer = new PDFRenderer(document);
    return getPageImage(pdfRenderer, pageNumber, name, storageId);
}{code}
The above call to getPageImage calls the following code 
{code:java}
File imageFile = File.createTempFile(StringUtils.toFilename(storageId) + "_" + 
pageNumber, ".png");
imageFile.deleteOnExit();

final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, 
ImageType.RGB);
ImageIO.write(image, "png", imageFile);

logger.debug("completed generate image file " + pageNumber + " for " + name);
return imageFile;{code}
The issue occurs in the second code snippet in the line
{code:java}
final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, 
ImageType.RGB);{code}
 

The stack trace is the following
{code:java}
Caused by: java.io.IOException: Error: Expected operator 'ID' actual='In'

at 
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:305)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:502)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
 ~[pdfbox-2.0.8.jar:2.0.8]

at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:203) 
~[pdfbox-2.0.8.jar:2.0.8]

at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) 
~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) 
~[pdfbox-2.0.8.jar:2.0.8]

at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:70) 
~[classes/:?]

at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:59) 
~[classes/:?]
{code}
Since rendering was not an issue with native pdfs I initially thought that only 
scanned pdfs were an issue. But after other scanned pdfs rendered, I am 
uncertain as to what could be causing some to render and some to error out.

  was:
I am using PDFBox version 2.0.8. I am trying to render scanned pdfs but there 
are a some that do not render and result in an error.  Native pdfs do not have 
any trouble rendering. The majority of the scanned pdfs that I have also do not 
have any trouble rendering but there are a couple that result in an error (one 
is attached).

This is the code I used to render the pdf.

 
{code:java}
try (PDDocument document = load(file)) {
    logger.debug("start generate image file " + pageNumber + " for " + name);
    PDFRenderer pdfRenderer = new PDFRenderer(document);
    return getPageImage(pdfRenderer, pageNumber, name, storageId);
}{code}
The above call to getPageImage calls the following code 

 
{code:java}
File imageFile = File.createTempFile(StringUtils.toFilename(storageId) + "_" + 
pageNumber, ".png");
imageFile.deleteOnExit();

final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, 
ImageType.RGB);
ImageIO.write(image, "png", imageFile);

logger.debug("completed generate image file " + pageNumber + " for " + name);
return imageFile;{code}
The issue occurs in the second code snippet in the line

 
{code:java}
final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, dpi, 
ImageType.RGB);{code}
 

The stack trace is the following

 
{code:java}
Caused by: java.io.IOException: Error: Expected operator 'ID' actual='In'

at 
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:305)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:502)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
 ~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
 ~[pdfbox-2.0.8.jar:2.0.8]

at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:203) 
~[pdfbox-2.0.8.jar:2.0.8]

at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) 
~[pdfbox-2.0.8.jar:2.0.8]

at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) 
~[pdfbox-2.0.8.jar:2.0.8]

at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:70) 
~[classes/:?]

at com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:59) 
~[classes/:?]
{code}
Since rendering was not an issue with native pdfs I initially thought that only 
scanned pdfs were an issue. But after other scanned pdfs rendered, I am 
uncertain as to what could be causing some to render and some to error out.


> Certain scanned pdfs do not render
> ----------------------------------
>
>                 Key: PDFBOX-4201
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4201
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.8
>            Reporter: Antonio Contreras
>            Priority: Major
>         Attachments: testDoc2.pdf
>
>
> I am using PDFBox version 2.0.8. I am trying to render scanned pdfs but there 
> are a some that do not render and result in an error.  Native pdfs do not 
> have any trouble rendering. The majority of the scanned pdfs that I have also 
> do not have any trouble rendering but there are a couple that result in an 
> error (one is attached).
> This is the code I used to render the pdf.
> {code:java}
> try (PDDocument document = load(file)) {
>     logger.debug("start generate image file " + pageNumber + " for " + name);
>     PDFRenderer pdfRenderer = new PDFRenderer(document);
>     return getPageImage(pdfRenderer, pageNumber, name, storageId);
> }{code}
> The above call to getPageImage calls the following code 
> {code:java}
> File imageFile = File.createTempFile(StringUtils.toFilename(storageId) + "_" 
> + pageNumber, ".png");
> imageFile.deleteOnExit();
> final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, 
> dpi, ImageType.RGB);
> ImageIO.write(image, "png", imageFile);
> logger.debug("completed generate image file " + pageNumber + " for " + name);
> return imageFile;{code}
> The issue occurs in the second code snippet in the line
> {code:java}
> final BufferedImage image = pdfRenderer.renderImageWithDPI(pageNumber - 1, 
> dpi, ImageType.RGB);{code}
>  
> The stack trace is the following
> {code:java}
> Caused by: java.io.IOException: Error: Expected operator 'ID' actual='In'
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:305)
>  ~[pdfbox-2.0.8.jar:2.0.8]
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:502)
>  ~[pdfbox-2.0.8.jar:2.0.8]
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
>  ~[pdfbox-2.0.8.jar:2.0.8]
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
>  ~[pdfbox-2.0.8.jar:2.0.8]
> at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:203) 
> ~[pdfbox-2.0.8.jar:2.0.8]
> at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145) 
> ~[pdfbox-2.0.8.jar:2.0.8]
> at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
>  ~[pdfbox-2.0.8.jar:2.0.8]
> at 
> com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:70) 
> ~[classes/:?]
> at 
> com.sustain.document.PdfPageGenerator.getPageImage(PdfPageGenerator.java:59) 
> ~[classes/:?]
> {code}
> Since rendering was not an issue with native pdfs I initially thought that 
> only scanned pdfs were an issue. But after other scanned pdfs rendered, I am 
> uncertain as to what could be causing some to render and some to error out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to