[
https://issues.apache.org/jira/browse/TIKA-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078088#comment-17078088
]
Tim Allison commented on TIKA-3086:
-----------------------------------
Can you share a triggering file?
> not able to parse XLS file
> --------------------------
>
> Key: TIKA-3086
> URL: https://issues.apache.org/jira/browse/TIKA-3086
> Project: Tika
> Issue Type: Bug
> Components: tika-batch
> Affects Versions: 1.7, 1.17
> Reporter: kuladeep
> Priority: Major
>
> Hi Team,
> We are using tika to parse diffrent kind of files but some XLS we are getting
> below exception. Presently we are using tika-app-1.7.jar and we have tried
> tika-app-1.17 and 1.20 but still we are getting same exception. Please help
> us on this
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.officepar...@15aab8c6org.apache.tika.exception.TikaException:
> Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@15aab8c6 at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
> TextExtraction.extract(TextExtraction.java:50) at
> TextExtraction.main(TextExtraction.java:68)Caused by:
> org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException:
> Initialisation of record 0x85(BoundSheetRecord) left 1 bytes remaining still
> to be read. at
> org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177)
> at
> org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:239)
> at
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
> at
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:156)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 4
> more
>
> code we have written
> iStream = new FileInputStream(new File(fname)); iStream = new
> FileInputStream(new File(fname)); mData = new Metadata(); cHandler = new
> BodyContentHandler(-1); adp = new
> AutoDetectParser();//AutoDetectParser()OldExcelParser;
> System.out.println();
> ParseContext parseContext = new ParseContext();
> parseContext.set(Parser.class, adp);
> System.out.println(iStream+" "+cHandler+" "+mData+""+parseContext);
> System.out.println("Extracting ......\nPls wait..............\n");
> adp.parse(iStream, cHandler, mData, parseContext);
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)