[
https://issues.apache.org/jira/browse/TIKA-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227709#comment-14227709
]
Hudson commented on TIKA-1487:
------------------------------
SUCCESS: Integrated in tika-trunk-jdk1.7 #336 (See
[https://builds.apache.org/job/tika-trunk-jdk1.7/336/])
TIKA-1487 Based on the file format docs from OpenOffice, add detection and mime
types for the older Excel 2, 3 and 4 pre-ole2 formats (nick:
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1642152)
*
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
> Add mime for pre-OLE2 xls file
> ------------------------------
>
> Key: TIKA-1487
> URL: https://issues.apache.org/jira/browse/TIKA-1487
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Trivial
> Fix For: 1.7
>
> Attachments: 004444.xls
>
>
> On the govdocs1 corpus, nearly 91% of xls exceptions have this stacktrace:
> {noformat}
> Caused by: java.io.IOException: Invalid header signature; read
> 0x0010000000060409, expected 0xE11AB1A1E011CFD0 - Your file appears not to be
> a valid OLE2 document at
> org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:140) at
> org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:115) at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:198)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:162) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 13
> more
> {noformat}
> Excel is able to open the few files that I tried, and it looks like Excel
> thinks these are version 4.
> On the POI user list, [~gagravarr] identified this header as pre-OLE2 and
> asked that we add the mime to Tika so that we can handle appropriately. Test
> file soon to be attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)