[ 
https://issues.apache.org/jira/browse/TIKA-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078088#comment-17078088
 ] 

Tim Allison commented on TIKA-3086:
-----------------------------------

Can you share a triggering file?

> not able to parse XLS file
> --------------------------
>
>                 Key: TIKA-3086
>                 URL: https://issues.apache.org/jira/browse/TIKA-3086
>             Project: Tika
>          Issue Type: Bug
>          Components: tika-batch
>    Affects Versions: 1.7, 1.17
>            Reporter: kuladeep
>            Priority: Major
>
> Hi Team,
> We are using tika to parse diffrent kind of files but some XLS we are getting 
> below exception. Presently we are using tika-app-1.7.jar and we have tried 
> tika-app-1.17 and 1.20 but still we are getting same exception. Please help 
> us on this
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.officepar...@15aab8c6org.apache.tika.exception.TikaException:
>  Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@15aab8c6 at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> TextExtraction.extract(TextExtraction.java:50) at 
> TextExtraction.main(TextExtraction.java:68)Caused by: 
> org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: 
> Initialisation of record 0x85(BoundSheetRecord) left 1 bytes remaining still 
> to be read. at 
> org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177)
>  at 
> org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:239)
>  at 
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57) 
> at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:156)
>  at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 4 
> more
>  
> code we have written
> iStream = new FileInputStream(new File(fname)); iStream = new 
> FileInputStream(new File(fname)); mData = new Metadata(); cHandler = new 
> BodyContentHandler(-1); adp = new 
> AutoDetectParser();//AutoDetectParser()OldExcelParser;                        
> System.out.println();                                                
> ParseContext parseContext = new ParseContext();
>             parseContext.set(Parser.class, adp);            
> System.out.println(iStream+"  "+cHandler+" "+mData+""+parseContext);          
>   System.out.println("Extracting ......\nPls wait..............\n");          
>   adp.parse(iStream, cHandler, mData, parseContext);            
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to