Gregory Lepore created TIKA-4067:
------------------------------------

             Summary: Add magic for ASPRS Lidar data
                 Key: TIKA-4067
                 URL: https://issues.apache.org/jira/browse/TIKA-4067
             Project: Tika
          Issue Type: Sub-task
            Reporter: Gregory Lepore
         Attachments: 
0a0b002a319eea990e13da3d197fe4948e2cb8b72d02d5fa19c42382b1548a63, 
0a0b757ac6ca7692a14645aed1ff2c2f5d7db11533087dac1855b521929d71c0, 
0a0d0890d50d693831e5e3bb9f08e927760609408eaa5d86cf6e0d56e122e0e1

The ASPRS Lidar data format occurs over 11,000 times in the latest Common Crawl 
dataset. There are three signatures to cover the three major versions. There 
does not appear to be a mime type that covers this format.

The full signatures are below, but they can be simplified to:

 

4C415346\{20}01(00|01|02)

 

which is ASCII LASF, followed after 20 bytes by one of 0100, 0101, or 0102 for 
versions 1.0, 1.1, and 1.2.

 
||External signatures|File extension: las
File extension: laz|
||Internal signatures|
||Name|ASPRS Lidar Data Exchange Format 1.0|
||Description|ASCII header: LASF, followed after 20 bytes by version number 1.0|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Byte order| |
||Value|4C415346\{20}0100\{78}[00:99]|
|
|

 
||External signatures|File extension: las
File extension: laz|
||Internal signatures|
||Name|ASPRS Lidar Data Exchange Format 1.1|
||Description|ASCII header: LASF, followed after 20 bytes by version number 1.1|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Byte order| |
||Value|4C415346\{20}0101\{78}[00:99]|
|
|

 
||External signatures|File extension: las
File extension: laz|
||Internal signatures|
||Name|ASPRS Lidar Data Exchange Format 1.2|
||Description|ASCII header: LASF, followed after 20 bytes by version number 1.2|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Byte order| |
||Value|4C415346\{20}0102\{78}[00:99]|
|
|

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to