[
https://issues.apache.org/jira/browse/TIKA-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280787#comment-14280787
]
Matthew Jones commented on TIKA-1520:
-------------------------------------
I'm not sure what you mean by signature, but looking at the file it is ascii
readable text. All of the examples I have to start with the line
(* Content-type: application/mathematica *)
These older files have the line
http://www.uta.edu/math/pages/main/complab.htm
(************** Content-type: application/mathematica **************
Since version 3 it's been a .nb extension (before it was .ma)
fileext.com Says
Identifying characters Hex: 28 2A , ASCII: (*
It seems like just this first like has a "(*" and application/mathematica in it
if that helps at all.
> Provide parsing and detection for Mathematica files
> ---------------------------------------------------
>
> Key: TIKA-1520
> URL: https://issues.apache.org/jira/browse/TIKA-1520
> Project: Tika
> Issue Type: Wish
> Components: parser
> Reporter: Matthew Jones
> Attachments: lab0.nb
>
>
> Currently Mathematica notebooks that have data in them do not appear to be
> detected correctly.
> java -jar tika-app-1.7.jar -d lab0.nb
> text/plain
> An empty file with the .nb extension though is detected correctly. ;)
> touch testmath.nb
> java -jar tika-app-1.7.jar -d testmath.nb
> application/mathematica
> I'm not too sure how to fix this so just adding it as a wish. Thanks!
> Examples on this page
> http://www2.stetson.edu/~mhale/calc2/math.htm
> including the file
> http://www2.stetson.edu/~mhale/calc2/lab0.nb
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)