[
https://issues.apache.org/jira/browse/TIKA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127060#comment-17127060
]
Xiaohong Yang commented on TIKA-3107:
-------------------------------------
Thank you for the information. I filed the following bug in Apache POI.
Bug 64500 - LeftoverDataException: Initialisation of record
0x85(BoundSheetRecord) left 28 bytes remaining still to be read
([https://bz.apache.org/bugzilla/show_bug.cgi?id=64500]).
We do not know what software generated the sample file. Excel can open it
properly.
> AutoDetectParser.parse failed with error "Initialisation of record
> 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-3107
> URL: https://issues.apache.org/jira/browse/TIKA-3107
> Project: Tika
> Issue Type: Bug
> Components: metadata, parser
> Affects Versions: 1.24
> Reporter: Xiaohong Yang
> Priority: Critical
> Attachments: SOJ.NW.00092712.xls
>
>
> When I try to get the metadata of the sample excel file with the
> AutoDetectParser.parse method with the following Java code, I got an error
> "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining
> still to be read".
>
> InputStream input = new FileInputStream(localFilePath);
> BodyContentHandler handler = = new BodyContentHandler(-1);
> Metadata metadata = new Metadata();
> TikaConfig config = TikaConfigFactory.getTikaConfig();
> Parser autoDetectParser = new AutoDetectParser(config);
> ParseContext context = new ParseContext();
> context.set(TikaConfig.class, config);
> autoDetectParser.parse(input, handler, metadata, context);
>
> Here is the stack trace:
>
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@2caa5ec
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> …
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by:
> org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException:
> Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still
> to be read.
> at
> org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)
> at
> org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:233)
> at
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
> at
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:158)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 15 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)