[jira] [Commented] (TIKA-2406) IllegalArgumentException in text extraction from PDF file

Tim Allison (JIRA) Fri, 30 Jun 2017 09:00:29 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070332#comment-16070332
 ]


Tim Allison commented on TIKA-2406:
-----------------------------------

Thank you for sharing.  This is corrupt, as you noted.  Please do the same 
thing with this that you did with TIKA-2407.  I wasn't able to get anything out 
of this file even with the legacy 1.8.x branch.  Your request would be for a 
clearer exception?  Or, how should this document be handled?

> IllegalArgumentException in text extraction from PDF file
> ---------------------------------------------------------
>
>                 Key: TIKA-2406
>                 URL: https://issues.apache.org/jira/browse/TIKA-2406
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Jorge Spinsanti
>         Attachments: IllegalArgumentException.pdf
>
>
> I got an IllegalArgumentException in text extraction from PDF file (attached):
> {code}
> Caused by: org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.pdf.PDFParser@d71dc5e
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 16 more
> Caused by: java.lang.IllegalArgumentException: root cannot be null
>       at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75)
>       at 
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129)
>       at 
> org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1381)
>       at 
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:235)
>       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 23 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TIKA-2406) IllegalArgumentException in text extraction from PDF file

Reply via email to