[
https://issues.apache.org/jira/browse/TIKA-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178632#comment-13178632
]
Nick Burch commented on TIKA-826:
---------------------------------
Should be fixed in r1226651 - Neither parser now claims the format, and if it
gets to the OOXML one on the basis of the parent type, it's declined. Tests
also added for these cases.
> TikaException / OfficeXmlFileException with .xlsb files
> -------------------------------------------------------
>
> Key: TIKA-826
> URL: https://issues.apache.org/jira/browse/TIKA-826
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.1
> Environment: Windows 7
> Reporter: John Mastarone
> Fix For: 1.1
>
> Attachments: TIKA-826.patch
>
>
> The file testEXCEL.xlsb in the tika-parsers test-documents folder causes a
> POI OfficeXmlFileException when one tries to open it with TikaGUI or TikaCLI,
> using a latest build. The reason: Tika has it configured to be opened with
> the OfficeParser class, rather than the OOXMLParser class; it is an Office
> 2007 file, and should be opened with the OOXMLParser class. Neither the
> ExcelParserTest class nor the OOXMLParserTest class has anything related to
> .xlsb files. Once changes are made to these two parsers so that the
> OOXMLParser is used (I'll submit a patch shortly for these), the
> OfficeXmlFileException goes away, and a new POI exception
> (IllegalArgumentException in the ExtractorFactory class) arises in its place,
> somewhat related to unsolved POI bug 51921; the creator of this bug mentions
> a .xlsb file among others. This exception appears to occur because POI
> doesn't seem to be able to handle .xlsb files whatsoever. A cursory search
> of the source for "xlsb" or its mime type yields nothing relevant, and its
> project has no .xlsb test files that I can see.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira