[ 
https://issues.apache.org/jira/browse/HBASE-26258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-26258.
-----------------------------------------
    Fix Version/s:     (was: 3.0.0-alpha-2)
                       (was: 2.5.0)
       Resolution: Fixed

> Universal compression support
> -----------------------------
>
>                 Key: HBASE-26258
>                 URL: https://issues.apache.org/jira/browse/HBASE-26258
>             Project: HBase
>          Issue Type: Improvement
>          Components: HFile, Operability
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>
> Some Hadoop compression codecs became more available in recent Hadoop 3.x 
> releases, addressed by HBASE-25940. This is nice but still requires native 
> platform support, which to state the obvious is not available on all 
> platforms and architectures, even if native libaries for some are bundled 
> into jars. 
> Airlift's aircompressor 
> (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 
> licensed library, for Java 8 and up, available in Maven central, which 
> provides pure Java implementations of desirable compression algorithms gzip, 
> lz4, lzo, snappy, and zstd, and Hadoop compression codecs for same, claiming 
> "_they are typically 300% faster than the JNI wrappers_." 
> (https://github.com/airlift/aircompressor). This library is under active 
> development and has up to date releases because it is used by Trino.
> We have another project that depends on universal availability of SNAPPY. I 
> would like to make this change as a general improvement which also satisfies 
> that requirement. (The as yet unnamed project will be contributed later.) It 
> will be a very nice-to-have to have universal ZSTD support available as well. 
> Proposed changes:
> * Modify Compression.java such that compression codec implementation classes 
> can be specified by configuration. Currently they are hardcoded as strings. 
> * Pull in aircompressor as a 'compile' time dependency so it will be bundled 
> into our build and made available on the server classpath. 
> * Modify Compression.java to fall back to an aircompressor pure Java 
> implementation if schema specifies a compression algorithm, a Hadoop native 
> codec was specified as desired implementation, but the requisite native 
> support is somehow not available. 
> The combination of these changes will provide universal (pure Java) support 
> for these desired and desirable compression codecs while retaining default 
> behavior, which is to load and utilize Hadoop native implementations of same, 
> if native support is available. They will also let you override this default 
> if you wish to chase the claimed benefits of the pure Java alternatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to