Gregory Lepore created TIKA-4075:
------------------------------------
Summary: Add magic for GRAPPA Database RADX File
Key: TIKA-4075
URL: https://issues.apache.org/jira/browse/TIKA-4075
Project: Tika
Issue Type: Sub-task
Reporter: Gregory Lepore
Attachments: 359_50N15_25.RADX, 359_50N28_00.RADX, 359_50S17_50.RADX
The GRAPPA Database RADX File format occurs 8,343 times in the second most
recent Common Crawl dataset. No known mime type.
It's probably best to leave implementation of this for a day with nothing else
to do. It's a very complex signature. Actually there are two versions of the
signature due to vagaries of the PRONOM signature language.
Both are at offset 0:
80\{14}(40|3F)\{7}(C0|40)\{2}(00|FF)\{1}(00|FF)\{1}(FF|00)\{8-9}(00|80)\{53-54}(40|3F)
and
00\{14}(40|3F)\{7}(C0|40)\{2}(00|FF)\{1}(00|FF)\{1}(FF|00)\{8-9}(00|80)\{53-54}(40|3F)
Which can be simplified (if that's possible) to:
(80|14)\{14}(40|3F)\{7}(C0|40)\{2}(00|FF)\{1}(00|FF)\{1}(FF|00)\{8-9}(00|80)\{53-54}(40|3F)
The \{8-9} notation indicates the subsequent values of either 00 or 80 can
occur after either 8 or 9 bytes. See sample files.
See [https://ftp.imcce.fr/pub/catalogs/GRAPPA/lisez%20moi.txt] (in French) for
more information.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)