[ 
https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341993#comment-14341993
 ] 

Nick Burch commented on TIKA-289:
---------------------------------

There are a few issues with integrating it:
 * Very few of the entries in the file magic list have mimetypes, only 
descriptions, so we'd need to manually review each one and search for a 
mimetype. (I see only 287 different mimetypes, as compared to the vast number 
of magic entries)
 * Many of the file magic entries include a little bit of parser logic too, 
with various bits of the matching being included in the description string, 
sometimes lots
 * Some of the matching is actually done with code (much like our container 
aware detectors), not the mime magic, see the {{src}} directory for those

The file magic and sourcecode are a very good source of magic patterns, and 
sometimes also basic parser logic, but I'm not sure how practical a bulk import 
would be?

> Add magic byte patterns from file(1)
> ------------------------------------
>
>                 Key: TIKA-289
>                 URL: https://issues.apache.org/jira/browse/TIKA-289
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> As discussed in TIKA-285, the file(1) command comes with a pretty 
> comprehensive set of magic byte patterns. It would be nice to get those 
> patterns included also in Tika.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to