Gregory Lepore created TIKA-4088:
------------------------------------

             Summary: Add magic for SEG Y format
                 Key: TIKA-4088
                 URL: https://issues.apache.org/jira/browse/TIKA-4088
             Project: Tika
          Issue Type: Sub-task
            Reporter: Gregory Lepore
         Attachments: 
0b518f422c100574e3ae8963842bd18bdb4ad27254022fcb6dafc7fe1b7d366c, 
38a58826f54edafebab0bb4381f2b39f7b9b6c04fbba3c948f18fe6861bd14d1, 
79d81ddaad3582d71b596c819e0c61d43b092f8dbafb1d4400199673cfce0a8f

The SEG Y format occurs 2,390 times (roughly) in the latest Common Crawl 
dataset. No known mime type. Magic is:

Offset 0: C340(F1|40)40

Offset 80: C3

Offset 160: C3

 

With additional C3 every 80 bytes 38 more times. However, the above matched all 
SEG Y files in my test collections, with no false positives, so it should be 
good enough.

 

File extension is .segy.

 

[https://web.archive.org/web/20160312030348/http://www.seg.org/resources/publications/misc/technical-standards]

 

A different signature at:

 

https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1110&strPageToDisplay=signatures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to