[
https://issues.apache.org/jira/browse/HBASE-26259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418687#comment-17418687
]
Andrew Kyle Purtell edited comment on HBASE-26259 at 9/22/21, 4:09 PM:
-----------------------------------------------------------------------
Of particular interest is the hbase-compression-lz4 module. lz4-java is up to
52% faster than Hadoop native lz4 codec for decompression in these results, and
faster for both compression and decompression in every test case in general.
This might be improved further through additional optimization of the
integration. Anyway, if this JIRA is committed, we will make
hbase-compression-lz4 the new default for WAL value compression, and also
choose it for another case where we need a fast compression codec that we can
know is always available at runtime (no JIRA filed for that yet).
was (Author: apurtell):
Of particular interest is the hbase-compression-lz4 module. lz4-java is up to
52% faster than Hadoop native lz4 codec for decompression in these results, and
faster for both compression and decompression in every test case in general.
This might be improved further through additional optimization of the
integration. Anyway, if this JIRA is committed, we will make
hbase-compression-lz4 the new default for WAL value compression, and also
choose it for another case where we need a fast compression codec (no JIRA
filed for that yet).
> Fallback support to pure Java compression
> -----------------------------------------
>
> Key: HBASE-26259
> URL: https://issues.apache.org/jira/browse/HBASE-26259
> Project: HBase
> Issue Type: Sub-task
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
> Attachments: BenchmarkCodec.java, BenchmarksMain.java,
> RandomDistribution.java, ac_lz4_results.pdf, ac_snappy_results.pdf,
> ac_zstd_results.pdf, lz4_lz4-java_result.pdf, xerial_snappy_results.pdf
>
>
> Airlift’s aircompressor
> (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2
> licensed library, for Java 8 and up, available in Maven central, which
> provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and
> Hadoop compression codecs for same, claiming “_they are typically 300% faster
> than the JNI wrappers_.” (https://github.com/airlift/aircompressor). This
> library is under active development and up to date releases because it is
> used by Trino.
> Proposed changes:
> * Modify Compression.java such that compression codec implementation classes
> can be specified by configuration. Currently they are hardcoded as strings.
> * Pull in aircompressor as a ‘compile’ time dependency so it will be bundled
> into our build and made available on the server classpath.
> * Modify Compression.java to fall back to an aircompressor pure Java
> implementation if schema specifies a compression algorithm, a Hadoop native
> codec was specified as desired implementation, but the requisite native
> support is somehow not available.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)