[ https://issues.apache.org/jira/browse/TIKA-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280787#comment-14280787 ]
Matthew Jones commented on TIKA-1520: ------------------------------------- I'm not sure what you mean by signature, but looking at the file it is ascii readable text. All of the examples I have to start with the line (* Content-type: application/mathematica *) These older files have the line http://www.uta.edu/math/pages/main/complab.htm (************** Content-type: application/mathematica ************** Since version 3 it's been a .nb extension (before it was .ma) fileext.com Says Identifying characters Hex: 28 2A , ASCII: (* It seems like just this first like has a "(*" and application/mathematica in it if that helps at all. > Provide parsing and detection for Mathematica files > --------------------------------------------------- > > Key: TIKA-1520 > URL: https://issues.apache.org/jira/browse/TIKA-1520 > Project: Tika > Issue Type: Wish > Components: parser > Reporter: Matthew Jones > Attachments: lab0.nb > > > Currently Mathematica notebooks that have data in them do not appear to be > detected correctly. > java -jar tika-app-1.7.jar -d lab0.nb > text/plain > An empty file with the .nb extension though is detected correctly. ;) > touch testmath.nb > java -jar tika-app-1.7.jar -d testmath.nb > application/mathematica > I'm not too sure how to fix this so just adding it as a wish. Thanks! > Examples on this page > http://www2.stetson.edu/~mhale/calc2/math.htm > including the file > http://www2.stetson.edu/~mhale/calc2/lab0.nb -- This message was sent by Atlassian JIRA (v6.3.4#6332)