Crash parsing pdf file 
(http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf)
 from Tika
----------------------------------------------------------------------------------------------------------------------------

                 Key: PDFBOX-617
                 URL: https://issues.apache.org/jira/browse/PDFBOX-617
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 0.8.0-incubator
         Environment: Linux debian: Linux 2.6.18-6-686 #1 SMP i686 GNU/Linux 
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing)
            Reporter: Stefano Falconetti
            Priority: Critical


Parsing the file 
http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf
 the call to Tika "parse" fails with the followinf stack trace:

java.io.IOException: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
IOException from org.apache.tika.parser.pdf.pdfpar...@1578aab
        at 
com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:143)
        at 
com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.main(GenericDocumentParserTikaImpl.java:306)
Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
IOException from org.apache.tika.parser.pdf.pdfpar...@1578aab
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:126)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
        at 
com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:69)
        ... 1 more
Caused by: org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:841)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:808)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        ... 4 more
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(AbstractList.java:350)
        at 
org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at 
org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 8 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to