[
https://issues.apache.org/jira/browse/TIKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved TIKA-1634.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.9
{noformat}
[chipotle:~/tmp/tika1.9] mattmann% svn commit -m "Fix for TIKA-1634 Detecting
problem with Matlab source code contributed by Jihyun Oh <[email protected]>
this closes #49." CHANGES.txt
tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Sending CHANGES.txt
Sending
tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Transmitting file data ..
Committed revision 1683464.
[chipotle:~/tmp/tika1.9] mattmann%
{noformat}
> Detecting problem with Matlab source code
> -----------------------------------------
>
> Key: TIKA-1634
> URL: https://issues.apache.org/jira/browse/TIKA-1634
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.8
> Reporter: Ji-Hyun Oh
> Assignee: Chris A. Mattmann
> Priority: Trivial
> Labels: earthcube
> Fix For: 1.9
>
> Attachments: BARCAST_MainCode.m, Initial_Vals_Maker.m,
> custom-mimetypes.xml, tika-mimetypes.xml, wtsgaus.m
>
>
> Both Matlab source code and Objective-C source code have the same suffix,
> which is .m. Therefore, Matlab has additional match value in mime types.xml.
> In tika-mimetypes.xml Matlab is defined as:
> <mime-type type="text/x-matlab">
> <_comment>Matlab source code</_comment>
> <magic priority="50">
> <match value="function [" type="string" offset="0"/>
> </magic>
> <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
> <sub-class-of type="text/plain"/>
> </mime-type>
> However, Matlab codes does not always start with "function [“. Therefore,
> some Matlab codes are detected as text/x-bojcsrc. Based on the source codes
> collected from NOAA Paleoclimatology Software Resources, many Matlab codes
> have match value like these (problematic files are attached as an example):
> <mime-type type="text/x-matlab">
> <_comment>Matlab source code</_comment>
> <magic priority="50">
> <match value="function" type="string" offset="0"/>
> <match value="%" type="string" offset="0"/>
> </magic>
> <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
> <sub-class-of type="text/plain"/>
> </mime-type>
> Conducted several detecting tests by using different Matlab packages obtained
> from NOAA Paleoclimatology Software Resources, with/without
> Custom-mimtypes.xml. Results are attached. As a results, total 103 Matlab
> files are detected correctly with custom-mimetypes.xml, while 42 Matlab
> files are detected as Matlab files without custom-mimetypes.xml (= only with
> current match value). However, this match value for Matlab source code could
> be only common in Paleoclimatology community.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)