[
https://issues.apache.org/jira/browse/TIKA-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778821#comment-17778821
]
Josh McCullough commented on TIKA-4067:
---------------------------------------
{{las}} file-type detection is returning {{application/octet-stream}} while
{{laz}} is working correctly. Using Tika {{2.9.1}} with
{{tika-parsers-standard-package}}.
> Add magic for ASPRS Lidar data
> ------------------------------
>
> Key: TIKA-4067
> URL: https://issues.apache.org/jira/browse/TIKA-4067
> Project: Tika
> Issue Type: Sub-task
> Reporter: Gregory Lepore
> Priority: Minor
> Attachments:
> 0a0b002a319eea990e13da3d197fe4948e2cb8b72d02d5fa19c42382b1548a63,
> 0a0b757ac6ca7692a14645aed1ff2c2f5d7db11533087dac1855b521929d71c0,
> 0a0d0890d50d693831e5e3bb9f08e927760609408eaa5d86cf6e0d56e122e0e1
>
>
> The ASPRS Lidar data format occurs over 11,000 times in the latest Common
> Crawl dataset. There are three signatures to cover the three major versions.
> There does not appear to be a mime type that covers this format.
> The full signatures are below, but they can be simplified to:
>
> 4C415346\{20}01(00|01|02)
>
> which is ASCII LASF, followed after 20 bytes by one of 0100, 0101, or 0102
> for versions 1.0, 1.1, and 1.2.
>
> https://www.nationalarchives.gov.uk/PRONOM/fmt/370
>
> ||External signatures|File extension: las
> File extension: laz|
> ||Internal signatures||
> ||Name|ASPRS Lidar Data Exchange Format 1.0|
> ||Description|ASCII header: LASF, followed after 20 bytes by version number
> 1.0|
> ||Byte sequences||
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Byte order| |
> ||Value|4C415346\{20}0100\{78}[00:99]|
>
> ||External signatures|File extension: las
> File extension: laz|
> ||Internal signatures||
> ||Name|ASPRS Lidar Data Exchange Format 1.1|
> ||Description|ASCII header: LASF, followed after 20 bytes by version number
> 1.1|
> ||Byte sequences||
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Byte order| |
> ||Value|4C415346\{20}0101\{78}[00:99]|
>
> ||External signatures|File extension: las
> File extension: laz|
> ||Internal signatures||
> ||Name|ASPRS Lidar Data Exchange Format 1.2|
> ||Description|ASCII header: LASF, followed after 20 bytes by version number
> 1.2|
> ||Byte sequences||
> ||Position type|Absolute from BOF|
> ||Offset|0|
> ||Byte order| |
> ||Value|4C415346\{20}0102\{78}[00:99]|
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)