Pascal Davoust created COMPRESS-600:
---------------------------------------
Summary: Add capability to configure Deflater strategy in
GzipCompressorOutputStream
Key: COMPRESS-600
URL: https://issues.apache.org/jira/browse/COMPRESS-600
Project: Commons Compress
Issue Type: Improvement
Components: Compressors
Affects Versions: 1.21
Environment: Any JDK-based environment.
Reporter: Pascal Davoust
TheĀ {{GzipCompressorOutputStream}} uses a {{java.util.zip.Deflater}} to perform
the compression heavy lifting.
However, the {{java.util.zip.Deflater}} class (making use of and delegating to
the underlying native {{zlib}} library) allows to specify a strategy which
drives which part of the deflate algorithm is used or not (keeping the full
deflate format compatibility, requiring no change on the decoding side), see
[https://docs.oracle.com/javase/8/docs/api/java/util/zip/Deflater.html#setStrategy-int-]
Adding the capability to control this strategy within the {{GzipParameters}}
would be a very welcomed addition, as there is no way to sub-class and extend
{{GzipCompressorOutputStream}} to do so.
The rationale behind this request is related to compressing base64-heavy
content.
It turns out that since base64 breaking byte-alignment, the LZ77 part of the
deflate algorithm is run in sub-optimal conditions (read: it is defeated most
of the time), consuming CPU cycles for almost no gain.
Skipping the LZ77 part of the deflate algorithm to use Huffman coding only does
the job pretty well (at least in our case): it takes 3x to 5x less time (= CPU
cycles) to compress and saves 26% of the initial data size instead of 27% with
default settings (the compression ratio drop is then very minimal vs. a very
significant CPU usage win).
Irrespective of our own use case and measurements, this looks like a very slick
addition to this utility class to expose an already proven and available
feature.
I'm happy to provide a PR for you guys to review, just let me know.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)