[
https://issues.apache.org/jira/browse/PDFBOX-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088546#comment-16088546
]
Tilman Hausherr commented on PDFBOX-3870:
-----------------------------------------
Something is messed up with that file. startxref is 42461 but the file size is
37934. The file is very recent. It was created with Apache FOP 1.0. Maybe tell
your client to try the current version. Or find out whether a data block was
lost without this being noticed.
> Wrong type of referenced length in COSParser
> --------------------------------------------
>
> Key: PDFBOX-3870
> URL: https://issues.apache.org/jira/browse/PDFBOX-3870
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.6
> Reporter: Jorge Spinsanti
> Attachments: COSParserIOException.pdf
>
>
> I got an exception to extract text from PDF with Tika (exception thrown on
> pdfbox code):
> {code}
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
> org.apache.tika.parser.pdf.PDFParser@2be78cf6
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> Caused by: java.io.IOException: Wrong type of referenced length object
> COSObject{11, 0}: COSNull
> at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:908)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:950)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:781)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:742)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633)
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1132)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1066)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:141)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 24 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]