[
https://issues.apache.org/jira/browse/HBASE-26259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418904#comment-17418904
]
Andrew Kyle Purtell commented on HBASE-26259:
---------------------------------------------
Some updates:
Couldn't get Maven to do what I want with shading in the compression modules,
not critical, so dropped this.
Optimized the case where in compress() if the out array is large enough we can
compress to it directly and avoid double buffering in that case. Depending on
the configured buffer size there might be an opportunity. There isn't the same
opportunity in decompress() because we can expect input is often expanded
beyond the length of the out array.
The LZMA codec is now Hadoop block compression stream compatible. This was
incomplete work, actually, because the codec must be compatible in order to be
used for HFile compression.
> Fallback support to pure Java compression
> -----------------------------------------
>
> Key: HBASE-26259
> URL: https://issues.apache.org/jira/browse/HBASE-26259
> Project: HBase
> Issue Type: Sub-task
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
> Attachments: BenchmarkCodec.java, BenchmarksMain.java,
> RandomDistribution.java, ac_lz4_results.pdf, ac_snappy_results.pdf,
> ac_zstd_results.pdf, lz4_lz4-java_result.pdf, xerial_snappy_results.pdf
>
>
> Airlift’s aircompressor
> (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2
> licensed library, for Java 8 and up, available in Maven central, which
> provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and
> Hadoop compression codecs for same, claiming “_they are typically 300% faster
> than the JNI wrappers_.” (https://github.com/airlift/aircompressor). This
> library is under active development and up to date releases because it is
> used by Trino.
> Proposed changes:
> * Modify Compression.java such that compression codec implementation classes
> can be specified by configuration. Currently they are hardcoded as strings.
> * Pull in aircompressor as a ‘compile’ time dependency so it will be bundled
> into our build and made available on the server classpath.
> * Modify Compression.java to fall back to an aircompressor pure Java
> implementation if schema specifies a compression algorithm, a Hadoop native
> codec was specified as desired implementation, but the requisite native
> support is somehow not available.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)