[
https://issues.apache.org/jira/browse/TIKA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154154#comment-13154154
]
Jukka Zitting commented on TIKA-786:
------------------------------------
Cool, looks good. I was simultaneously approaching this from a slightly
different angle (see
https://github.com/jukka/tika/commit/97a15bdcd79549d3c5147b7b8f9b6f46a9bb8fc5),
but your changes look nicer (I like the way you can give preference to non-Tika
detectors) so let's go with that.
> Tika CLI --detect returns incorrect content-type for files with altered
> extensions
> ----------------------------------------------------------------------------------
>
> Key: TIKA-786
> URL: https://issues.apache.org/jira/browse/TIKA-786
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.1
> Environment: Windows
> Reporter: John Mastarone
> Priority: Minor
>
> From a discussion on the user mailing list on Nov. 11 2011, where the
> following was requested as a new bug: Tika CLI will return incorrect content
> type information when called with --detect for files that have had their
> extensions modified (and nothing else). MS Word (.doc) documents that have
> their extension changed to .xls or .ppt will be incorrectly detected as Excel
> or PowerPoint documents, whereas the --metadata option will determine the
> content type correctly (as application/msword), based on the actual contents
> of these mis-named files. The same also occurs with other types of MS Office
> 2003 documents, and could possibly occur with a wide range of document types.
> To quote Nick B., from the user mailing list: "If you look at the
> TestMediaTypes class you'll see what you can get with just the mime magic and
> filenames, and then there's TestContainerAwareDetector which shows the
> correct detection happening by using the extra detectors available".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira