[ 
https://issues.apache.org/jira/browse/HBASE-26259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418374#comment-17418374
 ] 

Andrew Kyle Purtell commented on HBASE-26259:
---------------------------------------------

I will be uploading new PDFs soon, when the latest round of micro benchmarks 
are finished. The results are better: LZ4 even faster (50% faster than Hadoop 
native) and Snappy not so slow (20% slower than Hadoop native at worst).

Block size is the size of the data buffer compressed or uncompressed each 
round, like an hfile block. 

Sigma is the sigma parameter to the Zipfian distribution used to generate test 
data. 1.1 is basically uncompressible. From there as sigma increases the 
compressibility of the data increases. A sigma of 2 produces data that 
compresses ok (30-40%). A sigma of 5 produces data that compresses very well. 

Number of blocks is how many times in a loop the compressor is called, or how 
many blocks are written or read from a compression stream (again it’s like how 
hfile blocks would be handled)

Time is average ms per op as measured by JMH. 

Error is the range of variance as measured by JMH. 

Difference is how much better or worse is the provided codec in the patch as 
compared to the corresponding Hadoop native codec, as a percentage. 

> Fallback support to pure Java compression
> -----------------------------------------
>
>                 Key: HBASE-26259
>                 URL: https://issues.apache.org/jira/browse/HBASE-26259
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-2
>
>         Attachments: BenchmarkCodec.java, BenchmarksMain.java, 
> RandomDistribution.java, ac_lz4_results.pdf, ac_snappy_results.pdf, 
> ac_zstd_results.pdf, lz4_lz4-java_result.pdf, xerial_snappy_results.pdf
>
>
> Airlift’s aircompressor 
> (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 
> licensed library, for Java 8 and up, available in Maven central, which 
> provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and 
> Hadoop compression codecs for same, claiming “_they are typically 300% faster 
> than the JNI wrappers_.” (https://github.com/airlift/aircompressor). This 
> library is under active development and up to date releases because it is 
> used by Trino.
> Proposed changes:
> * Modify Compression.java such that compression codec implementation classes 
> can be specified by configuration. Currently they are hardcoded as strings.
> * Pull in aircompressor as a ‘compile’ time dependency so it will be bundled 
> into our build and made available on the server classpath.
> * Modify Compression.java to fall back to an aircompressor pure Java 
> implementation if schema specifies a compression algorithm, a Hadoop native 
> codec was specified as desired implementation, but the requisite native 
> support is somehow not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to