https://issues.apache.org/bugzilla/show_bug.cgi?id=54213

            Bug ID: 54213
           Summary: Exception parsing XLS embedded in PPT file
           Product: POI
           Version: 3.8
          Hardware: PC
                OS: Mac OS X 10.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSLF
          Assignee: [email protected]
          Reporter: [email protected]
    Classification: Unclassified

Created attachment 29644
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29644&action=edit
Extracted 1.xls file

This is a spinoff from https://issues.apache.org/jira/browse/TIKA-1033

I used Tika to extract embedded documents from the attached emb.ppt.  One of
those documents is a chart, and Tika detects it as an excel document and
TikaCLI -z saves it as 1.xls (attached).

But when I try to parse the 1.xls with Tika it hits an exception:

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@4eaf6cb1
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:138)
    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:399)
    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:121)
Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to
construct record instance
    at
org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:65)
    at
org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:301)
    at
org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:285)
    at
org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:251)
    at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:143)
    at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
    at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:292)
    at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:144)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:194)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 5 more
Caused by: org.apache.poi.hssf.record.RecordFormatException: Not enough data
(0) to read requested (2) bytes
    at
org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:216)
    at
org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:233)
    at
org.apache.poi.hssf.record.WindowOneRecord.<init>(WindowOneRecord.java:71)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
    at
org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:57)
    ... 15 more

However, Excel 2007 also cannot open 1.xls ... so I'm not sure where the bug
really is (Tika's extraction of 1.xls from emb.ppt, or Tika/POI's parsing of
1.xls).

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to