kuladeep created TIKA-3086:
------------------------------

             Summary: not able to parse XLS file
                 Key: TIKA-3086
                 URL: https://issues.apache.org/jira/browse/TIKA-3086
             Project: Tika
          Issue Type: Bug
          Components: tika-batch
    Affects Versions: 1.17, 1.7
            Reporter: kuladeep


Hi Team,

We are using tika to parse diffrent kind of files but some XLS we are getting 
below exception. Presently we are using tika-app-1.7.jar and we have tried 
tika-app-1.17 and 1.20 but still we are getting same exception. Please help us 
on this

org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.officepar...@15aab8c6org.apache.tika.exception.TikaException:
 Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@15aab8c6 at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
TextExtraction.extract(TextExtraction.java:50) at 
TextExtraction.main(TextExtraction.java:68)Caused by: 
org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: 
Initialisation of record 0x85(BoundSheetRecord) left 1 bytes remaining still to 
be read. at 
org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177)
 at 
org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:239)
 at 
org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57) 
at 
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:156) 
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) 
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) 
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 4 
more

 

code we have written

iStream = new FileInputStream(new File(fname)); iStream = new 
FileInputStream(new File(fname)); mData = new Metadata(); cHandler = new 
BodyContentHandler(-1); adp = new 
AutoDetectParser();//AutoDetectParser()OldExcelParser;                        
System.out.println();                                                
ParseContext parseContext = new ParseContext();
            parseContext.set(Parser.class, adp);            
System.out.println(iStream+"  "+cHandler+" "+mData+""+parseContext);            
System.out.println("Extracting ......\nPls wait..............\n");            
adp.parse(iStream, cHandler, mData, parseContext);            

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to