[
https://issues.apache.org/jira/browse/PDFBOX-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255779#comment-14255779
]
Tilman Hausherr commented on PDFBOX-2527:
-----------------------------------------
What I mean is that you catch the exception in the nonSeq parser, and then only
use the old parser.
The attached PDF isn't just "bad" meaning that one developer hasn't applied the
specification to 100%. The PDF has trash at the end, maybe as the result of a
corrupted filesystem.
> IOException: Negative seek offset in NonSequentialPDFParser
> -----------------------------------------------------------
>
> Key: PDFBOX-2527
> URL: https://issues.apache.org/jira/browse/PDFBOX-2527
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.8, 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Fix For: 1.8.9, 2.0.0
>
> Attachments: PDFBOX-2527-069020.pdf
>
>
> {code}
> Exception in thread "main" java.io.IOException: Negative seek offset
> at java.io.RandomAccessFile.seek(Native Method)
> at
> org.apache.pdfbox.io.RandomAccessBufferedFileInputStream.seek(RandomAccessBufferedFileInputStream.java:116)
> at
> org.apache.pdfbox.io.PushBackInputStream.seek(PushBackInputStream.java:234)
> at
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:492)
> at
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:1013)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:951)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:897)
> at org.apache.pdfbox.tools.PDFReader.parseDocument(PDFReader.java:375)
> at org.apache.pdfbox.tools.PDFReader.openPDFFile(PDFReader.java:340)
> at org.apache.pdfbox.tools.PDFReader.main(PDFReader.java:326)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:80)
> {code}
> This happens with several malformed PDFs from the test set in TIKA-1442.
> These files (303385, 069020, 303385, 742141, 982996) all have some trash at
> the end.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)