Gregory Lepore created TIKA-4088:
------------------------------------
Summary: Add magic for SEG Y format
Key: TIKA-4088
URL: https://issues.apache.org/jira/browse/TIKA-4088
Project: Tika
Issue Type: Sub-task
Reporter: Gregory Lepore
Attachments:
0b518f422c100574e3ae8963842bd18bdb4ad27254022fcb6dafc7fe1b7d366c,
38a58826f54edafebab0bb4381f2b39f7b9b6c04fbba3c948f18fe6861bd14d1,
79d81ddaad3582d71b596c819e0c61d43b092f8dbafb1d4400199673cfce0a8f
The SEG Y format occurs 2,390 times (roughly) in the latest Common Crawl
dataset. No known mime type. Magic is:
Offset 0: C340(F1|40)40
Offset 80: C3
Offset 160: C3
With additional C3 every 80 bytes 38 more times. However, the above matched all
SEG Y files in my test collections, with no false positives, so it should be
good enough.
File extension is .segy.
[https://web.archive.org/web/20160312030348/http://www.seg.org/resources/publications/misc/technical-standards]
A different signature at:
https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1110&strPageToDisplay=signatures
--
This message was sent by Atlassian Jira
(v8.20.10#820010)