Seva Alekseyev created TIKA-2184:
------------------------------------
Summary: RecordFormatException on a valid Excel file
Key: TIKA-2184
URL: https://issues.apache.org/jira/browse/TIKA-2184
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.14
Environment: Windows 7 x64, JVM 1.8.0_101
Reporter: Seva Alekseyev
Attachments: HIVT Discrepancy Report- 3-29-04UCSF.xls
On the attached file, which opens fine with Excel, the Tika parser throws the
following:
org.apache.poi.hssf.record.RecordFormatException: Unhandled Continue Record
followining class org.apache.poi.hssf.record.TabIdRecord
at
org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord:379
at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord:273
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents:175
at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents:136
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile:312
at org.apache.tika.parser.microsoft.ExcelExtractor.parse:169
at org.apache.tika.parser.microsoft.OfficeParser.parse:177
at org.apache.tika.parser.microsoft.OfficeParser.parse:130
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)