[ 
https://issues.apache.org/jira/browse/PDFBOX-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088546#comment-16088546
 ] 

Tilman Hausherr commented on PDFBOX-3870:
-----------------------------------------

Something is messed up with that file. startxref is 42461 but the file size is 
37934. The file is very recent. It was created with Apache FOP 1.0. Maybe tell 
your client to try the current version. Or find out whether a data block was 
lost without this being noticed.

> Wrong type of referenced length in COSParser
> --------------------------------------------
>
>                 Key: PDFBOX-3870
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3870
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.6
>            Reporter: Jorge Spinsanti
>         Attachments: COSParserIOException.pdf
>
>
> I got an exception to extract text from PDF with Tika (exception thrown on 
> pdfbox code):
> {code}
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.pdf.PDFParser@2be78cf6
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> Caused by: java.io.IOException: Wrong type of referenced length object 
> COSObject{11, 0}: COSNull
>       at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:908)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:950)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:781)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:742)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673)
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633)
>       at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1132)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1066)
>       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:141)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 24 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to