[ 
https://issues.apache.org/jira/browse/TIKA-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280787#comment-14280787
 ] 

Matthew Jones commented on TIKA-1520:
-------------------------------------

I'm not sure what you mean by signature, but looking at the file it is ascii 
readable text. All of the examples I have to start with the line

(* Content-type: application/mathematica *)

These older files have the line
http://www.uta.edu/math/pages/main/complab.htm
(************** Content-type: application/mathematica **************

Since version 3 it's been a .nb extension (before it was .ma) 

fileext.com Says

Identifying characters Hex: 28 2A , ASCII: (*

It seems like just this first like has a "(*" and application/mathematica in it 
if that helps at all.

> Provide parsing and detection for Mathematica files
> ---------------------------------------------------
>
>                 Key: TIKA-1520
>                 URL: https://issues.apache.org/jira/browse/TIKA-1520
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>            Reporter: Matthew Jones
>         Attachments: lab0.nb
>
>
> Currently Mathematica notebooks that have data in them do not appear to be 
> detected correctly.
> java -jar tika-app-1.7.jar -d lab0.nb           
> text/plain
> An empty file with the .nb extension though is detected correctly. ;)
> touch testmath.nb
> java -jar tika-app-1.7.jar -d testmath.nb            
> application/mathematica
> I'm not too sure how to fix this so just adding it as a wish. Thanks!
> Examples on this page 
> http://www2.stetson.edu/~mhale/calc2/math.htm
> including the file
> http://www2.stetson.edu/~mhale/calc2/lab0.nb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to