[
https://issues.apache.org/jira/browse/TIKA-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820902#comment-15820902
]
Nick Burch edited comment on TIKA-2194 at 1/12/17 12:38 PM:
------------------------------------------------------------
Ah, I've found the problem with your filename case. In the tika mimetype
definition for matlab we have this:
{noformat}
<!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
{noformat}
This leaves us with a problem - matlab program files don't have any universal
unique magic to spot, and they don't have a unique file extension either :(
That said, with your test file and the Tika App, we do manage to detect it
correct as matlab just from the function definition on the first line. If you
change your line 73 to {{def sherlock = new DefaultDetector();}} then the
detection will work
was (Author: gagravarr):
Ah, I've found the problem with your filename case. In the tika mimetype
definition for matlab we have this:
{{ <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->}}
This leaves us with a problem - matlab program files don't have any universal
unique magic to spot, and they don't have a unique file extension either :(
That said, with your test file and the Tika App, we do manage to detect it
correct as matlab just from the function definition on the first line. If you
change your line 73 to {{def sherlock = new DefaultDetector();}} then the
detection will work
> matlab files detected as 'text/plain'
> -------------------------------------
>
> Key: TIKA-2194
> URL: https://issues.apache.org/jira/browse/TIKA-2194
> Project: Tika
> Issue Type: Bug
> Components: detector, mime
> Affects Versions: 1.9, 1.14
> Reporter: Mihai Glont
>
> matlab files from https://issues.apache.org/jira/browse/TIKA-1634 are
> reported to have mime type 'text/plain' with either DefaultDetector or
> MimeTypes. I am able to reproduce the problem by running the following Groovy
> script https://gist.github.com/mglont/16630c8a66fdddaaa7aa44820d6f021f
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)