[
https://issues.apache.org/jira/browse/PDFBOX-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913572#comment-13913572
]
John Hewson commented on PDFBOX-1916:
-------------------------------------
The 1752-3u file has many syntax errors related to streams. Adobe Acrobat Pro
complains about dozens of streams in the file not being correct.
Having said that, the code in TIFFFaxDecoder is somewhat suspicious but I can't
even begin to understand it. Instead I've added a workaround in revision
1572283 and I can see absolutely no problems with page 8 of the file.
> java.lang.ArrayIndexOutOfBoundsException in inlineimage
> -------------------------------------------------------
>
> Key: PDFBOX-1916
> URL: https://issues.apache.org/jira/browse/PDFBOX-1916
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Assignee: John Hewson
> Priority: Minor
> Labels: ccitt
> Fix For: 2.0.0
>
> Attachments: 1752-3u.pdf
>
>
> I get this with page 8 of the attached file:
> 13.02.2014 20:10:10.809 WARN [main]
> org.apache.pdfbox.util.PDFStreamEngine:546 -
> java.lang.ArrayIndexOutOfBoundsException: 4
> java.lang.ArrayIndexOutOfBoundsException: 4
> at
> org.apache.pdfbox.filter.TIFFFaxDecoder.decodeT6(TIFFFaxDecoder.java:1153)
> at
> org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:126)
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDInlinedImage.createImage(PDInlinedImage.java:161)
> at
> org.apache.pdfbox.util.operator.pagedrawer.BeginInlineImage.process(BeginInlineImage.java:60)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:533)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:261)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:236)
> at
> org.apache.pdfbox.pdfviewer.PageDrawer.drawType3String(PageDrawer.java:444)
> at
> org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:295)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:489)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:533)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:261)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:227)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:209)
> at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:151)
> at org.apache.pdfbox.util.RenderUtil.renderPage(RenderUtil.java:212)
> at org.apache.pdfbox.util.RenderUtil.convertToImage(RenderUtil.java:177)
> at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:273)
> at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:77)
> Some observations:
> - I can't see what image is missing when rendered
> - The data read between ID and EI (see
> http://www.verypdf.com/document/pdf-format-reference/pg_0352.htm ) includes
> the LF (0x0A). I tried to remove that in debugging, but the exception came
> anyway.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)