Mehdi Ennaime created COMPRESS-646:
--------------------------------------

             Summary: Improve performance of the Snappy Framed I/O streams
                 Key: COMPRESS-646
                 URL: https://issues.apache.org/jira/browse/COMPRESS-646
             Project: Commons Compress
          Issue Type: Wish
          Components: Compressors
    Affects Versions: 1.22
         Environment: java 11.0.2 (openjdk )
tested on both Windows 10 and linux (Ubuntu 20.04)
            Reporter: Mehdi Ennaime
         Attachments: Tools.java

Hello,

I've been using the snappy format as a way to quickly compress/decompress json 
files, and have been using the
{{FramedSnappyCompressorOutputStream }}and
{{FramedSnappyCompressorInputStream }}provided by Apache Compress to do so 
since I already had several dependencies to apache.compress module.

Although the compression/decompression works fine for every file, feedback 
regarding performance issues for large files started to emerge.

The performance of these streams was very underwhelming upon testing.

The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was 
taking 2minutes, which is far from the expected perfomances of a snappy stream 
which  "[...] does not aim for maximum compression, or compatibility with any 
other compression library; instead, it aims for very high speeds and reasonable 
compression.".

Switching to xerial/snappy-java 's Framed IO Streams reduced the 
compression/decompression times by two orders of magnitude.

Running the same code in the provided [^Tools.java] through a maven command 
took 1.5sec by replacing the Stream implementation to 
{{org.xerial.snappy.SnappyFramedInputStream }}, versus a consistent 125+secs 
with {{FramedSnappyCompressorInputStream .
}}
Since it's not a bug, i'm not flagging this ticket as such but it makes the 
usage of the apache compress library pointless for that format, and even 
counter-productive.

Having performances up to par with other implementations, or the decompressor 
to be deprecated would be greatly appreciated.

I've tried to upload the aforementionned file, but Jira refuses to take as the 
direct upload limit is 60mb. I should however be able to provide a 40-ish mb 
file if necessary.

Best Regards,

Mehdi Ennaïme



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to