ContainerAwareDetector doesn't support non-MSOffice files which use the same 
magic
----------------------------------------------------------------------------------

                 Key: TIKA-486
                 URL: https://issues.apache.org/jira/browse/TIKA-486
             Project: Tika
          Issue Type: Improvement
            Reporter: Antoni Mylka


There are many applications which use the MSOffice magic number. I know of 
Corel Presentations X3, Corel Quattro Pro 7 and X3 and Microsoft Works Word 
Processor. They have their own mime types. 

They aren't properly supported by POI though which means that if the 
ContentAwareDetector finds such a file, it will resort to the 
POIFSContainerDetector and return the basic application/x-tika-msoffice file 
type because POI won't be able to say anything more specific. This will happen 
even in situations when the fallback detector might come up with a better 
answer.

That's why IMHO the fallback detector should be used if the 
POIFSContainerDetector returns x-tika-msoffice. If the fallback detector comes 
up with a more specific type - the more specific one should be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to