Gregory Lepore created TIKA-4074:
------------------------------------

             Summary: Add magic for TeX Virtual Font format
                 Key: TIKA-4074
                 URL: https://issues.apache.org/jira/browse/TIKA-4074
             Project: Tika
          Issue Type: Sub-task
            Reporter: Gregory Lepore
         Attachments: aebx10.vf, aebx12.vf, aebxsl10.vf

The TeX Virtual Font format occurs 6,047 times in the second most recent Common 
Crawl dataset. No known mime type. The magic is:

 

F7CA\{9}F300\{4}0010 at offset 0.

 

The above signature will catch most TeX vf files, however some will be missed. 
However, there were no false positives so I think it's a good compromise to 
catch the majority of sample files.

 

It would be nice to see the results of additional testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to