[
https://issues.apache.org/jira/browse/TIKA-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gregory Lepore updated TIKA-4088:
---------------------------------
Description:
The SEG Y format occurs 2,390 times (roughly) in the latest Common Crawl
dataset. No known mime type. Magic is:
Offset 0: C340(F1|40)40
Offset 80: C3
Offset 160: C3
With additional C3 every 80 bytes 38 more times. However, the above matched all
SEG Y files in my test collections, with no false positives, so it should be
good enough.
File extension is .segy and .sgy.
[https://web.archive.org/web/20160312030348/http://www.seg.org/resources/publications/misc/technical-standards]
A different signature at:
[https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1110&strPageToDisplay=signatures]
was:
The SEG Y format occurs 2,390 times (roughly) in the latest Common Crawl
dataset. No known mime type. Magic is:
Offset 0: C340(F1|40)40
Offset 80: C3
Offset 160: C3
With additional C3 every 80 bytes 38 more times. However, the above matched all
SEG Y files in my test collections, with no false positives, so it should be
good enough.
File extension is .segy.
[https://web.archive.org/web/20160312030348/http://www.seg.org/resources/publications/misc/technical-standards]
A different signature at:
https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1110&strPageToDisplay=signatures
> Add magic for SEG Y format
> --------------------------
>
> Key: TIKA-4088
> URL: https://issues.apache.org/jira/browse/TIKA-4088
> Project: Tika
> Issue Type: Sub-task
> Reporter: Gregory Lepore
> Priority: Minor
> Attachments:
> 0b518f422c100574e3ae8963842bd18bdb4ad27254022fcb6dafc7fe1b7d366c,
> 38a58826f54edafebab0bb4381f2b39f7b9b6c04fbba3c948f18fe6861bd14d1,
> 79d81ddaad3582d71b596c819e0c61d43b092f8dbafb1d4400199673cfce0a8f
>
>
> The SEG Y format occurs 2,390 times (roughly) in the latest Common Crawl
> dataset. No known mime type. Magic is:
> Offset 0: C340(F1|40)40
> Offset 80: C3
> Offset 160: C3
>
> With additional C3 every 80 bytes 38 more times. However, the above matched
> all SEG Y files in my test collections, with no false positives, so it should
> be good enough.
>
> File extension is .segy and .sgy.
>
> [https://web.archive.org/web/20160312030348/http://www.seg.org/resources/publications/misc/technical-standards]
>
> A different signature at:
>
> [https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1110&strPageToDisplay=signatures]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)