Andreas Meier created TIKA-2576:
-----------------------------------
Summary: Add application/zstd detection and parser
Key: TIKA-2576
URL: https://issues.apache.org/jira/browse/TIKA-2576
Project: Tika
Issue Type: Improvement
Components: detector, parser
Reporter: Andreas Meier
Attachments: huffman-compressed-larger,
huffmann-compressed-larger-result.txt
The IETF is currently checking the specification of Zstandard compression and
the application/zstd Media Type:
[https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html|https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html]
As soon as the MediaType application/zstd is set as standard the Media Type
shall be implemented.
Possible mime-detection for tika-mimetypes.xml (second comment has to be
changed when the standard is final):
{code:xml}
<mime-type type="application/zstd">
<_comment>https://en.wikipedia.org/wiki/Zstandard</_comment>
<_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment>
<magic priority="50">
<match value="0xFD2FB528" type="little32" offset="0"/>
</magic>
<glob pattern="*.zstd"/>
</mime-type>
{code}
commons-compress version 1.16 and later provide a compressor and decompressor
for the algorithm, based on com.github.luben zstd-jni
[https://github.com/luben/zstd-jni|https://github.com/luben/zstd-jni]
Attached sampe zstd file (huffman-compressed-larger) and the result after
decompressing it.
Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3
{code:xml}
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.16.1</version>
</dependency>
<dependency>
<groupId>com.github.luben</groupId>
<artifactId>zstd-jni</artifactId>
<version>1.3.3-3</version>
</dependency>
{code}
Regards
Andreas
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)