[ 
https://issues.apache.org/jira/browse/COMPRESS-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pascal Davoust updated COMPRESS-600:
------------------------------------
    Description: 
The {{GzipCompressorOutputStream}} uses a {{java.util.zip.Deflater}} to perform 
the compression heavy lifting.

However, the {{java.util.zip.Deflater}} class (making use of and delegating to 
the underlying native {{zlib}} library) allows to specify a strategy which 
drives which part of the deflate algorithm is used or not (keeping the full 
deflate format compatibility, requiring no change on the decoding side), see 
[https://docs.oracle.com/javase/8/docs/api/java/util/zip/Deflater.html#setStrategy-int-]

Adding the capability to control this strategy within the {{GzipParameters}} 
would be a very welcomed addition, as there is no way to sub-class and extend 
{{GzipCompressorOutputStream}} to do so.

The rationale behind this request is related to compressing base64-heavy 
content.

It turns out that since base64 is breaking byte-alignment, the LZ77 part of the 
deflate algorithm is run in sub-optimal conditions (read: it is defeated most 
of the time), consuming CPU cycles for almost no gain.

Skipping the LZ77 part of the deflate algorithm to use Huffman coding only does 
the job pretty well (at least in our case): it takes 3x to 5x less time (= CPU 
cycles) to compress and saves 26% of the initial data size instead of 27% with 
default settings (the compression ratio drop is then very minimal vs. a very 
significant CPU usage win).

Irrespective of our own use case and measurements, this looks like a very slick 
addition to this utility class to expose an already proven and available 
feature.

I'm happy to provide a PR for you guys to review, just let me know.

  was:
The {{GzipCompressorOutputStream}} uses a {{java.util.zip.Deflater}} to perform 
the compression heavy lifting.

However, the {{java.util.zip.Deflater}} class (making use of and delegating to 
the underlying native {{zlib}} library) allows to specify a strategy which 
drives which part of the deflate algorithm is used or not (keeping the full 
deflate format compatibility, requiring no change on the decoding side), see 
[https://docs.oracle.com/javase/8/docs/api/java/util/zip/Deflater.html#setStrategy-int-]

Adding the capability to control this strategy within the {{GzipParameters}} 
would be a very welcomed addition, as there is no way to sub-class and extend 
{{GzipCompressorOutputStream}} to do so.

The rationale behind this request is related to compressing base64-heavy 
content.

It turns out that since base64 breaking byte-alignment, the LZ77 part of the 
deflate algorithm is run in sub-optimal conditions (read: it is defeated most 
of the time), consuming CPU cycles for almost no gain.

Skipping the LZ77 part of the deflate algorithm to use Huffman coding only does 
the job pretty well (at least in our case): it takes 3x to 5x less time (= CPU 
cycles) to compress and saves 26% of the initial data size instead of 27% with 
default settings (the compression ratio drop is then very minimal vs. a very 
significant CPU usage win).

Irrespective of our own use case and measurements, this looks like a very slick 
addition to this utility class to expose an already proven and available 
feature.

I'm happy to provide a PR for you guys to review, just let me know.


> Add capability to configure Deflater strategy in GzipCompressorOutputStream
> ---------------------------------------------------------------------------
>
>                 Key: COMPRESS-600
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-600
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.21
>         Environment: Any JDK-based environment.
>            Reporter: Pascal Davoust
>            Priority: Major
>
> The {{GzipCompressorOutputStream}} uses a {{java.util.zip.Deflater}} to 
> perform the compression heavy lifting.
> However, the {{java.util.zip.Deflater}} class (making use of and delegating 
> to the underlying native {{zlib}} library) allows to specify a strategy which 
> drives which part of the deflate algorithm is used or not (keeping the full 
> deflate format compatibility, requiring no change on the decoding side), see 
> [https://docs.oracle.com/javase/8/docs/api/java/util/zip/Deflater.html#setStrategy-int-]
> Adding the capability to control this strategy within the {{GzipParameters}} 
> would be a very welcomed addition, as there is no way to sub-class and extend 
> {{GzipCompressorOutputStream}} to do so.
> The rationale behind this request is related to compressing base64-heavy 
> content.
> It turns out that since base64 is breaking byte-alignment, the LZ77 part of 
> the deflate algorithm is run in sub-optimal conditions (read: it is defeated 
> most of the time), consuming CPU cycles for almost no gain.
> Skipping the LZ77 part of the deflate algorithm to use Huffman coding only 
> does the job pretty well (at least in our case): it takes 3x to 5x less time 
> (= CPU cycles) to compress and saves 26% of the initial data size instead of 
> 27% with default settings (the compression ratio drop is then very minimal 
> vs. a very significant CPU usage win).
> Irrespective of our own use case and measurements, this looks like a very 
> slick addition to this utility class to expose an already proven and 
> available feature.
> I'm happy to provide a PR for you guys to review, just let me know.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to