[ 
https://issues.apache.org/jira/browse/TIKA-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-486.
-----------------------------

         Assignee: Nick Burch
    Fix Version/s: 0.8
       Resolution: Fixed

Thanks for the sample files. I've added basic mime types entries for them in 
r993098.

In r993108, I've also added detection support for them to the OLE2 container 
detector, as well as some logic to the parent that should help in the unknown 
case, which I think should cover the case you previously found, but in a more 
general way across all container detectors.

> ContainerAwareDetector doesn't support non-MSOffice files which use the same 
> magic
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-486
>                 URL: https://issues.apache.org/jira/browse/TIKA-486
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Antoni Mylka
>            Assignee: Nick Burch
>             Fix For: 0.8
>
>         Attachments: test-documents.zip, 
> tika-non-office-files-with-office-magic.patch
>
>
> There are many applications which use the MSOffice magic number. I know of 
> Corel Presentations X3, Corel Quattro Pro 7 and X3 and Microsoft Works Word 
> Processor. They have their own mime types. 
> They aren't properly supported by POI though which means that if the 
> ContentAwareDetector finds such a file, it will resort to the 
> POIFSContainerDetector and return the basic application/x-tika-msoffice file 
> type because POI won't be able to say anything more specific. This will 
> happen even in situations when the fallback detector might come up with a 
> better answer.
> That's why IMHO the fallback detector should be used if the 
> POIFSContainerDetector returns x-tika-msoffice. If the fallback detector 
> comes up with a more specific type - the more specific one should be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to