[
https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341993#comment-14341993
]
Nick Burch commented on TIKA-289:
---------------------------------
There are a few issues with integrating it:
* Very few of the entries in the file magic list have mimetypes, only
descriptions, so we'd need to manually review each one and search for a
mimetype. (I see only 287 different mimetypes, as compared to the vast number
of magic entries)
* Many of the file magic entries include a little bit of parser logic too,
with various bits of the matching being included in the description string,
sometimes lots
* Some of the matching is actually done with code (much like our container
aware detectors), not the mime magic, see the {{src}} directory for those
The file magic and sourcecode are a very good source of magic patterns, and
sometimes also basic parser logic, but I'm not sure how practical a bulk import
would be?
> Add magic byte patterns from file(1)
> ------------------------------------
>
> Key: TIKA-289
> URL: https://issues.apache.org/jira/browse/TIKA-289
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Reporter: Jukka Zitting
> Priority: Minor
>
> As discussed in TIKA-285, the file(1) command comes with a pretty
> comprehensive set of magic byte patterns. It would be nice to get those
> patterns included also in Tika.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)