[
https://issues.apache.org/jira/browse/PDFBOX-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857704#action_12857704
]
Stefano Falconetti commented on PDFBOX-617:
-------------------------------------------
That was the dependency indicated by Tika, that I'm using. If the 0.8.0 and
1.1.0 are fully compatible and let Tika run fine, no problem, I will give a
try.
> Crash parsing pdf file
> (http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf)
> from Tika
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-617
> URL: https://issues.apache.org/jira/browse/PDFBOX-617
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.8.0-incubator
> Environment: Linux debian: Linux 2.6.18-6-686 #1 SMP i686 GNU/Linux
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing)
> Reporter: Stefano Falconetti
> Priority: Critical
> Attachments: Irlanda125pag.pdf, Irlanda26-52pag.pdf,
> Portogallo2010.pdf, StatiUniti2010_1.pdf
>
>
> Parsing the file
> http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf
> the call to Tika "parse" fails with the followinf stack trace:
> java.io.IOException: org.apache.tika.exception.TikaException: TIKA-198:
> Illegal IOException from org.apache.tika.parser.pdf.pdfpar...@1578aab
> at
> com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:143)
> at
> com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.main(GenericDocumentParserTikaImpl.java:306)
> Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal
> IOException from org.apache.tika.parser.pdf.pdfpar...@1578aab
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:126)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> at
> com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:69)
> ... 1 more
> Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:841)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:808)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> ... 4 more
> Caused by: java.util.NoSuchElementException
> at java.util.AbstractList$Itr.next(AbstractList.java:350)
> at
> org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
> at
> org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
> ... 8 more
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira