[
https://issues.apache.org/jira/browse/COMPRESS-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mehdi Ennaime updated COMPRESS-646:
-----------------------------------
Description:
Hello,
I've been using the snappy format as a way to quickly compress/decompress json
files, and have been using the
{\{FramedSnappyCompressorOutputStream}} and
{\{FramedSnappyCompressorInputStream}} provided by Apache Compress to do so
since I already had several dependencies to apache.compress module.
Although the compression/decompression works fine for every file, feedback
regarding performance issues for large files started to emerge.
The performance of these streams was very underwhelming upon testing.
The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was
taking 2minutes, which is far from the expected perfomances of a snappy stream
which "[...] does not aim for maximum compression, or compatibility with any
other compression library; instead, it aims for very high speeds and reasonable
compression.".
Switching to xerial/snappy-java 's Framed IO Streams reduced the
compression/decompression times by two orders of magnitude.
Running the same code in the provided [^Tools.java] through a maven command
took 1.5sec by replacing the Stream implementation to
\{{org.xerial.snappy.SnappyFramedInputStream}} , versus a consistent 125+secs
with \{{FramedSnappyCompressorInputStream}}.
Since it's not a bug, i'm not flagging this ticket as such but it makes the
usage of the apache compress library pointless for that format, and even
counter-productive.
Having performances up to par with other implementations, or the decompressor
to be deprecated would be greatly appreciated.
I've tried to upload the aforementionned file, but Jira refuses to take as the
direct upload limit is 60mb. I should however be able to provide a 40-ish mb
file if necessary.
Best Regards,
Mehdi Ennaïme
was:
Hello,
I've been using the snappy format as a way to quickly compress/decompress json
files, and have been using the
{{FramedSnappyCompressorOutputStream }}and
{{FramedSnappyCompressorInputStream }}provided by Apache Compress to do so
since I already had several dependencies to apache.compress module.
Although the compression/decompression works fine for every file, feedback
regarding performance issues for large files started to emerge.
The performance of these streams was very underwhelming upon testing.
The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was
taking 2minutes, which is far from the expected perfomances of a snappy stream
which "[...] does not aim for maximum compression, or compatibility with any
other compression library; instead, it aims for very high speeds and reasonable
compression.".
Switching to xerial/snappy-java 's Framed IO Streams reduced the
compression/decompression times by two orders of magnitude.
Running the same code in the provided [^Tools.java] through a maven command
took 1.5sec by replacing the Stream implementation to
{{org.xerial.snappy.SnappyFramedInputStream }}, versus a consistent 125+secs
with {{FramedSnappyCompressorInputStream .
}}
Since it's not a bug, i'm not flagging this ticket as such but it makes the
usage of the apache compress library pointless for that format, and even
counter-productive.
Having performances up to par with other implementations, or the decompressor
to be deprecated would be greatly appreciated.
I've tried to upload the aforementionned file, but Jira refuses to take as the
direct upload limit is 60mb. I should however be able to provide a 40-ish mb
file if necessary.
Best Regards,
Mehdi Ennaïme
> Improve performance of the Snappy Framed I/O streams
> ----------------------------------------------------
>
> Key: COMPRESS-646
> URL: https://issues.apache.org/jira/browse/COMPRESS-646
> Project: Commons Compress
> Issue Type: Wish
> Components: Compressors
> Affects Versions: 1.22
> Environment: java 11.0.2 (openjdk )
> tested on both Windows 10 and linux (Ubuntu 20.04)
> Reporter: Mehdi Ennaime
> Priority: Minor
> Attachments: Tools.java
>
>
> Hello,
> I've been using the snappy format as a way to quickly compress/decompress
> json files, and have been using the
> {\{FramedSnappyCompressorOutputStream}} and
> {\{FramedSnappyCompressorInputStream}} provided by Apache Compress to do so
> since I already had several dependencies to apache.compress module.
> Although the compression/decompression works fine for every file, feedback
> regarding performance issues for large files started to emerge.
> The performance of these streams was very underwhelming upon testing.
> The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was
> taking 2minutes, which is far from the expected perfomances of a snappy
> stream which "[...] does not aim for maximum compression, or compatibility
> with any other compression library; instead, it aims for very high speeds and
> reasonable compression.".
> Switching to xerial/snappy-java 's Framed IO Streams reduced the
> compression/decompression times by two orders of magnitude.
> Running the same code in the provided [^Tools.java] through a maven command
> took 1.5sec by replacing the Stream implementation to
> \{{org.xerial.snappy.SnappyFramedInputStream}} , versus a consistent 125+secs
> with \{{FramedSnappyCompressorInputStream}}.
> Since it's not a bug, i'm not flagging this ticket as such but it makes the
> usage of the apache compress library pointless for that format, and even
> counter-productive.
> Having performances up to par with other implementations, or the decompressor
> to be deprecated would be greatly appreciated.
> I've tried to upload the aforementionned file, but Jira refuses to take as
> the direct upload limit is 60mb. I should however be able to provide a 40-ish
> mb file if necessary.
> Best Regards,
> Mehdi Ennaïme
--
This message was sent by Atlassian Jira
(v8.20.10#820010)