[
https://issues.apache.org/jira/browse/TIKA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349286#comment-17349286
]
Jon Sneyers commented on TIKA-3411:
-----------------------------------
I'm not actually a user of Tika, but I am the chair of the JPEG XL adhoc group
within JPEG (ISO/IEC JTC 1 SC 29 WG 1).
The 2-byte version is the one that is expected to be the most common one on the
web. The longer one is the container format in case there is XMP/Exif metadata
that needs to be attached, but on the web we expect this will usually be
stripped and the compact, codestream-only header gets used. It does have a
short header (only 2 bytes) exactly to avoid header overhead; obviously this
does come at the cost of potentially more false-positive matches.
In JPEG, going back to JPEG-1, the convention for all codestream markers is to
use 0xFF followed by a unique byte. The specific combination of 0xFF0A was
assigned to "start of JPEG XL codestream". It is supposed to be a unique magic.
Text files with a BOM could start with 0xFF, but they shouldn't be able to
start with 0xFF0A unless they're indeed broken.
If you would know any examples of false-positive matches (other than broken
files), that would be greatly appreciated because it would also be relevant for
many other applications (in particular browsers) that rely on magic sniffing.
Does Tika also consider filename extensions? It might be a good idea to try to
avoid false positives by only considering 0xFF0A a match if the extension is
.jxl, if false positives are a concern.
> Add image/jxl
> -------------
>
> Key: TIKA-3411
> URL: https://issues.apache.org/jira/browse/TIKA-3411
> Project: Tika
> Issue Type: Wish
> Components: mime
> Reporter: Jon Sneyers
> Priority: Major
>
> image/jxl is the media type for JPEG XL (ISO/IEC 18181).
> Conventional filename extension is .jxl
> It is quite straightforward to detect based on magic: there are two possible
> header bytes:
> {{FF 0A}}
> or
> {{00 00 00 0C 4A 58 4C 20 0D 0A 87 0A}}
> {{}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)