tika-user  

Re: Exception threw when filtering the attached Excel using tika-app-0.4.jar

Jukka Zitting
Fri, 04 Dec 2009 00:31:32 -0800

Hi,

On Fri, Dec 4, 2009 at 3:37 AM, Li Leon <leon800...@gmail.com> wrote:
> I got an exception when filtering the attached Excel file using "type
> bugs.xls | java -jar tika-app -0.4.jar -".
>
> Any ideas? The embedded object seemed to cause the problem.

Yep, I can see the problem too. The exception is coming from the
Apache POI library that Tika uses for parsing Microsoft file formats.
Can you file a bug report about this in the POI issue tracker at [1]?
The problem might be related to the already reported bug #47685 [2].

[1] https://issues.apache.org/bugzilla/buglist.cgi?product=POI
[2] https://issues.apache.org/bugzilla/show_bug.cgi?id=47685

PS. For the record, the exception stack trace is included below.

BR,

Jukka Zitting

Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.officepar...@651dba45
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:175)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62)
Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to
construct record instance
        at 
org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:64)
        at 
org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:263)
        at 
org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:270)
        at 
org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:236)
        at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:122)
        at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:85)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:145)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:114)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        ... 3 more
Caused by: org.apache.poi.hssf.record.RecordFormatException: Ran out
of record data trying to read formula. fields: (option=-12 index=11540
not_used=353 name=''')
        at 
org.apache.poi.hssf.record.ExternalNameRecord.readFail(ExternalNameRecord.java:177)
        at 
org.apache.poi.hssf.record.ExternalNameRecord.<init>(ExternalNameRecord.java:164)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:56)
        ... 11 more