[
https://issues.apache.org/jira/browse/TIKA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177076#comment-15177076
]
Manisha Kampasi commented on TIKA-1882:
---------------------------------------
Hi Nick,
I based my analysis on the following sources of information:
1. http://www.opensource.apple.com/source/file/file-23/file/magic/magic.mime
2. http://www.filesignatures.net/index.php?search=MOV&mode=EXT
3. http://www.garykessler.net/library/file_sigs.html
I did not find these patterns in MP4 files of the data set that I am working
with. However, since you did, it seems like these patterns are not good
indicators of one container over the other and can be removed.
Thanks,
Manisha
> Updating the tika-mimetypes.xml for new mime magic patterns
> -----------------------------------------------------------
>
> Key: TIKA-1882
> URL: https://issues.apache.org/jira/browse/TIKA-1882
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.11
> Reporter: Manisha Kampasi
> Priority: Minor
> Labels: patch
>
> The following mime magic can be added to better detect the below mime-types:
> 1. vnd.ms-cab-compressed (.cab files) - pattern "MCSF" in the first 4 bytes
> 2. application/vnd.xara (.xar files) - pattern "xar!" in the first 4 bytes
> 3. application/x-mobipocket-ebook (.mobi files) - pattern "BOOKMOBI" starting
> at byte position 60
> 4. video/quicktime (.mov files) - patterns "free" and "wide" seen starting at
> byte position 4
> The changes can be seen here:
> https://github.com/mkampasi/tika/commit/f7433daf434a44937ba3ae8b15813a768f95e334
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)