[ 
https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Spinsanti updated TIKA-2406:
----------------------------------
    Description: 
I got an IllegalArgumentException in text extraction from PDF file (attached):
{code}
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.pdf.PDFParser@d71dc5e
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 16 more
Caused by: java.lang.IllegalArgumentException: root cannot be null
        at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75)
        at 
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129)
        at 
org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1381)
        at 
org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:235)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 23 more
{code}


> IllegalArgumentException in text extraction from PDF file
> ---------------------------------------------------------
>
>                 Key: TIKA-2406
>                 URL: https://issues.apache.org/jira/browse/TIKA-2406
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Jorge Spinsanti
>         Attachments: IllegalArgumentException.pdf
>
>
> I got an IllegalArgumentException in text extraction from PDF file (attached):
> {code}
> Caused by: org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.pdf.PDFParser@d71dc5e
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 16 more
> Caused by: java.lang.IllegalArgumentException: root cannot be null
>       at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75)
>       at 
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129)
>       at 
> org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1381)
>       at 
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:235)
>       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 23 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to