[ 
https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154154#comment-13154154
 ] 

Jukka Zitting commented on TIKA-786:
------------------------------------

Cool, looks good. I was simultaneously approaching this from a slightly 
different angle (see 
https://github.com/jukka/tika/commit/97a15bdcd79549d3c5147b7b8f9b6f46a9bb8fc5), 
but your changes look nicer (I like the way you can give preference to non-Tika 
detectors) so let's go with that.
                
> Tika CLI --detect returns incorrect content-type for files with altered 
> extensions
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-786
>                 URL: https://issues.apache.org/jira/browse/TIKA-786
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.1
>         Environment: Windows
>            Reporter: John Mastarone
>            Priority: Minor
>
> From a discussion on the user mailing list on Nov. 11 2011, where the 
> following was requested as a new bug: Tika CLI will return incorrect content 
> type information when called with --detect for files that have had their 
> extensions modified (and nothing else).  MS Word (.doc) documents that have 
> their extension changed to .xls or .ppt will be incorrectly detected as Excel 
> or PowerPoint documents, whereas the --metadata option will determine the 
> content type correctly (as application/msword), based on the actual contents 
> of these mis-named files.  The same also occurs with other types of MS Office 
> 2003 documents, and could possibly occur with a wide range of document types. 
>  To quote Nick B., from the user mailing list: "If you look at the 
> TestMediaTypes class you'll see what you can get with just the mime magic and 
> filenames, and then there's TestContainerAwareDetector which shows the 
> correct detection happening by using the extra detectors available".   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to