Ken Krugler
Tue, 12 Jan 2010 14:18:46 -0800
Hi Doug, On Jan 12, 2010, at 11:37am, Doug Carter wrote:
Hi all, I'm new to Tika and to this mailing list, so I hope this is the right place to ask this question. I've just downloading, built and installed Tika 0.5. I've been able totranslate Microsoft Office documents without any problems. However, whenI try to translate a PDF file, I get a parser exception.
Is this the case with any and all PDF files?Based on the stack trace below, it sure looks like a busted file, but I've mostly been working with the HTML parser.
-- Ken
The command line I'm running is: % java -jar tika-app/target/tika-app-0.5.jar foo.pdf The resulting exception output is:Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.pdfpar...@11e1e67 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:126) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 101)at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:175) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62) Caused by: org.apache.pdfbox.exceptions.WrappedIOExceptionat org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java: 237) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: 841) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java: 808) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java: 53) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)... 3 more Caused by: java.util.NoSuchElementException at java.util.AbstractList$Itr.next(AbstractList.java:350)at org .apache .pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java: 115) at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java: 538) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java: 203)... 7 more --- Can someone help point me to a way to solve this problem? I'm familiar with Java but not the PDF format or how Tika parses a document. Please let me know if there is a better forum to ask this question, or if I need to provide more information. TIA, Doug
-------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g