kuladeep created TIKA-3086:
------------------------------
Summary: not able to parse XLS file
Key: TIKA-3086
URL: https://issues.apache.org/jira/browse/TIKA-3086
Project: Tika
Issue Type: Bug
Components: tika-batch
Affects Versions: 1.17, 1.7
Reporter: kuladeep
Hi Team,
We are using tika to parse diffrent kind of files but some XLS we are getting
below exception. Presently we are using tika-app-1.7.jar and we have tried
tika-app-1.17 and 1.20 but still we are getting same exception. Please help us
on this
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.officepar...@15aab8c6org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@15aab8c6 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
TextExtraction.extract(TextExtraction.java:50) at
TextExtraction.main(TextExtraction.java:68)Caused by:
org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException:
Initialisation of record 0x85(BoundSheetRecord) left 1 bytes remaining still to
be read. at
org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177)
at
org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:239)
at
org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:156)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 4
more
code we have written
iStream = new FileInputStream(new File(fname)); iStream = new
FileInputStream(new File(fname)); mData = new Metadata(); cHandler = new
BodyContentHandler(-1); adp = new
AutoDetectParser();//AutoDetectParser()OldExcelParser;
System.out.println();
ParseContext parseContext = new ParseContext();
parseContext.set(Parser.class, adp);
System.out.println(iStream+" "+cHandler+" "+mData+""+parseContext);
System.out.println("Extracting ......\nPls wait..............\n");
adp.parse(iStream, cHandler, mData, parseContext);
--
This message was sent by Atlassian Jira
(v8.3.4#803005)