[ 
https://issues.apache.org/jira/browse/TIKA-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030686#comment-17030686
 ] 

Nick Burch commented on TIKA-3034:
----------------------------------

We tend to do 3ish releases a year. Last release was in December, so it's 
probably a month or two until there will be enough bug fixes / dependency 
upgrades / new features to convince our hard-working release manager 
[~tallison] to tackle it!

In the mean time, if you put the latest tika mimetypes file on your classpath 
ahead of the Tika Core jar, it'll be used instead of the built-in one, avoiding 
the need to use a full nightly build while you wait

> Detector always returns text/plain when scanning Mathematica files
> ------------------------------------------------------------------
>
>                 Key: TIKA-3034
>                 URL: https://issues.apache.org/jira/browse/TIKA-3034
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.23
>            Reporter: Tung Nguyen
>            Priority: Blocker
>             Fix For: 1.23
>
>
> We are working with Tika to implement our mime types detection module. The 
> library seemingly cannot detect Mathematica files although the documentation 
> confirmed it does [1]. The Tika detector always returns `text/plain` instead 
> of `application/mathematica` as described in the documentation as well as 
> unit tests [2].
> By doing the same need with Python code as below, we can obtain the right 
> mime types for any Mathematica file downloaded from the Wolfram Library 
> Archive [3]. 
> {code:java}
> #!/usr/bin/python3
> import mimetypes, os, sys
> test_file = sys.argv[1]
> print(mimetypes.MimeTypes().guess_type(test_file)[0])
> {code}
>  Therefore, we suspected there is a bug in Tika detector where it tries to 
> guess mime types for Mathematica files.
> References:
>  [1] [https://tika.apache.org/1.23/formats.html]
>  [2] 
> [https://github.com/apache/tika/blob/master/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java#L64]
> [3] [https://library.wolfram.com/infocenter/Courseware/4706/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to