[ 
https://issues.apache.org/jira/browse/PDFBOX-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228975#comment-14228975
 ] 

Andreas Lehmkühler commented on PDFBOX-2527:
--------------------------------------------

It looks like the pdf is truncated somewhere in the middle. I'm working on an 
improved self repair but as long as the parser isn't able to ignore corrupt 
parts it won't render.

> IOException: Negative seek offset in NonSequentialPDFParser
> -----------------------------------------------------------
>
>                 Key: PDFBOX-2527
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2527
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.8, 2.0.0
>            Reporter: Tilman Hausherr
>            Priority: Minor
>         Attachments: PDFBOX-2527-069020.pdf
>
>
> {code}
> Exception in thread "main" java.io.IOException: Negative seek offset
>       at java.io.RandomAccessFile.seek(Native Method)
>       at 
> org.apache.pdfbox.io.RandomAccessBufferedFileInputStream.seek(RandomAccessBufferedFileInputStream.java:116)
>       at 
> org.apache.pdfbox.io.PushBackInputStream.seek(PushBackInputStream.java:234)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:492)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:1013)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:951)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:897)
>       at org.apache.pdfbox.tools.PDFReader.parseDocument(PDFReader.java:375)
>       at org.apache.pdfbox.tools.PDFReader.openPDFFile(PDFReader.java:340)
>       at org.apache.pdfbox.tools.PDFReader.main(PDFReader.java:326)
>       at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:80)
> {code}
> This happens with several malformed PDFs from the test set in TIKA-1442. 
> These files (303385, 069020, 303385, 742141, 982996) all have some trash at 
> the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to