Gregory Lepore created TIKA-4075:
------------------------------------

             Summary: Add magic for GRAPPA Database RADX File
                 Key: TIKA-4075
                 URL: https://issues.apache.org/jira/browse/TIKA-4075
             Project: Tika
          Issue Type: Sub-task
            Reporter: Gregory Lepore
         Attachments: 359_50N15_25.RADX, 359_50N28_00.RADX, 359_50S17_50.RADX

The GRAPPA Database RADX File format occurs 8,343 times in the second most 
recent Common Crawl dataset. No known mime type. 

 

It's probably best to leave implementation of this for a day with nothing else 
to do. It's a very complex signature. Actually there are two versions of the 
signature due to vagaries of the PRONOM signature language.

 

Both are at offset 0:

 

80\{14}(40|3F)\{7}(C0|40)\{2}(00|FF)\{1}(00|FF)\{1}(FF|00)\{8-9}(00|80)\{53-54}(40|3F)

 

and

 

00\{14}(40|3F)\{7}(C0|40)\{2}(00|FF)\{1}(00|FF)\{1}(FF|00)\{8-9}(00|80)\{53-54}(40|3F)

 

Which can be simplified (if that's possible) to:

 

(80|14)\{14}(40|3F)\{7}(C0|40)\{2}(00|FF)\{1}(00|FF)\{1}(FF|00)\{8-9}(00|80)\{53-54}(40|3F)

The \{8-9} notation indicates the subsequent values of either 00 or 80 can 
occur after either 8 or 9 bytes. See sample files.

 

See [https://ftp.imcce.fr/pub/catalogs/GRAPPA/lisez%20moi.txt] (in French) for 
more information.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to