[
https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070332#comment-16070332
]
Tim Allison commented on TIKA-2406:
-----------------------------------
Thank you for sharing. This is corrupt, as you noted. Please do the same
thing with this that you did with TIKA-2407. I wasn't able to get anything out
of this file even with the legacy 1.8.x branch. Your request would be for a
clearer exception? Or, how should this document be handled?
> IllegalArgumentException in text extraction from PDF file
> ---------------------------------------------------------
>
> Key: TIKA-2406
> URL: https://issues.apache.org/jira/browse/TIKA-2406
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.15
> Reporter: Jorge Spinsanti
> Attachments: IllegalArgumentException.pdf
>
>
> I got an IllegalArgumentException in text extraction from PDF file (attached):
> {code}
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.pdf.PDFParser@d71dc5e
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 16 more
> Caused by: java.lang.IllegalArgumentException: root cannot be null
> at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75)
> at
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129)
> at
> org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1381)
> at
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:235)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 23 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)