[
https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725568#comment-17725568
]
Gregory Lepore commented on TIKA-3999:
--------------------------------------
There is a chance of collisions with other magic numbers, but at the time I
created most of the above there was no collision with anything in my test
environment, which covered all formats documented in PRONOM
([https://www.nationalarchives.gov.uk/PRONOM/Default.aspx)] and most of what
`file` identifies, plus around 500,000 unidentified formats.
That being said, I am glad to hear that your regression suite with verify the
above.
Is it possible to use the shorter magic numbers in addition to a specific file
extension to limit misidentifications? I don't know what the process is for
format identification in Tika...
Thanks, I can put together the magic numbers for the remaining tracker modules
I've documented and add them.
> audio/xm audio/x-mod
> --------------------
>
> Key: TIKA-3999
> URL: https://issues.apache.org/jira/browse/TIKA-3999
> Project: Tika
> Issue Type: Sub-task
> Reporter: Tim Allison
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)