[ 
https://issues.apache.org/jira/browse/PDFBOX-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-3292.
----------------------------------------
    Resolution: Fixed

COSParser#checkXRefStreamOffset simply checked if a dictionary can be found at 
the given offset. I rare case, such as the given pdfs, there is a dictionary 
but not the one we are looking for. Saying that, I've implemented an additional 
check to see if the dictionary at the given offset is the right one or not and 
now everything works fine.

[[email protected]] Thanks for the report!

> Error reading stream, expected='endstream' actual='' in non-truncated files
> ---------------------------------------------------------------------------
>
>                 Key: PDFBOX-3292
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3292
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tim Allison
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>             Fix For: 2.0.1, 2.1.0
>
>
> When PDF files are truncated, one of the most common exceptions in PDFBox 
> 2.0.0 is:
> {noformat}
> java.io.IOException: Error reading stream, expected='endstream' actual='' at 
> offset 165888
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:999)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseXrefObjStream(COSParser.java:326)
>       at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:287)
>       at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:192)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:847)
> {noformat}
> There are two files in govdocs1 that are NOT truncated and trigger this 
> exception in 2.0.0, but were parsed by PDFBox 1.8.11 with the classic parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to