Andreas Meier created TIKA-2576:
-----------------------------------

             Summary: Add application/zstd detection and parser
                 Key: TIKA-2576
                 URL: https://issues.apache.org/jira/browse/TIKA-2576
             Project: Tika
          Issue Type: Improvement
          Components: detector, parser
            Reporter: Andreas Meier
         Attachments: huffman-compressed-larger, 
huffmann-compressed-larger-result.txt

The IETF is currently checking the specification of Zstandard compression and 
the application/zstd Media Type: 
[https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html|https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html]

As soon as the MediaType application/zstd is set as standard the Media Type 
shall be implemented.

Possible mime-detection for tika-mimetypes.xml (second comment has to be 
changed when the standard is final):

{code:xml}
  <mime-type type="application/zstd">
    <_comment>https://en.wikipedia.org/wiki/Zstandard</_comment>
    
<_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment>
    <magic priority="50">
      <match value="0xFD2FB528" type="little32" offset="0"/>
    </magic>
    <glob pattern="*.zstd"/>
  </mime-type>
{code}

commons-compress version 1.16 and later provide a compressor and decompressor 
for the algorithm, based on com.github.luben zstd-jni 
[https://github.com/luben/zstd-jni|https://github.com/luben/zstd-jni]

Attached sampe zstd file (huffman-compressed-larger) and the result after 
decompressing it.

Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3

{code:xml}

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-compress</artifactId>
  <version>1.16.1</version>
</dependency>
<dependency>
  <groupId>com.github.luben</groupId>
  <artifactId>zstd-jni</artifactId>
  <version>1.3.3-3</version>
</dependency>
{code}


Regards

Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to