Marco Quaranta created TIKA-976:
-----------------------------------

             Summary: Inaccurate XLS detection trough POIFSContainerDetector
                 Key: TIKA-976
                 URL: https://issues.apache.org/jira/browse/TIKA-976
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 1.2
            Reporter: Marco Quaranta
         Attachments: test_book.xls

I've found an inaccurate detection with the attached xls file. 
POIFSContainerDetector is unable to determine the exact mimetype (vnd.ms-excel) 
and returns the generic "x-tika-msoffice". This is due to the fact this file's 
root names are :[Book, DocumentSummaryInformation, SummaryInformation]. 
POIFSContainerDetector checks only that names contains "WorkBook".
Could it be possible to add a further or-check like this:

if (names.contains("Workbook") || names.contains("Book"))

Thank you,
Marco

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to