[ 
https://issues.apache.org/jira/browse/HBASE-26259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422934#comment-17422934
 ] 

Andrew Kyle Purtell commented on HBASE-26259:
---------------------------------------------

*ITLCC Compression Test*

s3n://commoncrawl/crawl-data/CC-MAIN-2021-10/segments/1614178347293.1/warc/CC-MAIN-20210224165708-20210224195708-00000.warc.gz



(310,856 rows)

Note: A deflate codec implementation is not proposed as part of this 
contribution. It is just a simple integration of the Java runtime environment's 
java.util.zip.Deflater and java.util.zip.Inflater offered as a baseline 
comparison.

||Codec||Level||Total compacted size||Compression||Major compaction time 
(seconds)||
| | | | | |
|None|-|5,121,804,020|-|-|
| | | | | |
|lz4 (aircompressor)|-|1,525,277,027|70.2%|16|
|lz4 (lz4-java)|-|1,524,932,289|70.2%|13|
|lzo (aircompressor)|-|1,578,220,788|69.2%|15|
|Snappy (aircompressor)|-|1,595,784,200|68.8%|14|
|Snappy (xerial)|-|1,604,659,551|68.7%|17|
|ZStandard (aircompressor)|(dfast)|1,074,637,595|79.0%|32|
|ZStandard (zstd-jni)|1 (fast min)|1,111,058,386|78.3%|18|
|ZStandard (zstd-jni)|2 (fast max)|1,076,837,323|79.0%|20|
|ZStandard (zstd-jni)|4 (dfast max)|1,034,994,492|79.8%|33|
|ZStandard (zstd-jni)|5 (greedy)|1,021,233,357|80.1%|38|
|ZStandard (zstd-jni)|6 (lazy)|1,004,341,613|80.4%|50|
|ZStandard (zstd-jni)|12 (lazy2 max)|975,070,595|81.0%|225|
|ZStandard (zstd-jni)|15 (btlazy2 max)|909,137,633|82.2%|446|
|ZStandard (zstd-jni)|18 (btopt max)|905,353,364|82.3%|1214|
|ZStandard (zstd-jni)|22 (btultra max)|903,207,510|82.4%|2406|
|LMZA|1 (min)|972,730,691|81.0%|201|
|LZMA|3 (default)|955,130,988|81.4%|264|
|LZMA|6|911,930,048|82.2%|1260|
|LZMA|9 (max)|911,929,521|82.2%|2124|
| | | | | |
|Deflate (java.util.zip)|1 (min)|1,227,907,255|76.0%|43|
|Deflate (java.util.zip)|6 (default)|1,059,481,372|79.3%|89|
|Deflate (java.util.zip)|9 (max)|1,013,628,808|80.2%|147|

> Fallback support to pure Java compression
> -----------------------------------------
>
>                 Key: HBASE-26259
>                 URL: https://issues.apache.org/jira/browse/HBASE-26259
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-2
>
>         Attachments: BenchmarkCodec.java, BenchmarksMain.java, 
> RandomDistribution.java, ac_lz4_results.pdf, ac_snappy_results.pdf, 
> ac_zstd_results.pdf, lz4_lz4-java_result.pdf, xerial_snappy_results.pdf
>
>
> Airlift’s aircompressor 
> (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2 
> licensed library, for Java 8 and up, available in Maven central, which 
> provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and 
> Hadoop compression codecs for same, claiming “_they are typically 300% faster 
> than the JNI wrappers_.” (https://github.com/airlift/aircompressor). This 
> library is under active development and up to date releases because it is 
> used by Trino.
> Proposed changes:
> * Modify Compression.java such that compression codec implementation classes 
> can be specified by configuration. Currently they are hardcoded as strings.
> * Pull in aircompressor as a ‘compile’ time dependency so it will be bundled 
> into our build and made available on the server classpath.
> * Modify Compression.java to fall back to an aircompressor pure Java 
> implementation if schema specifies a compression algorithm, a Hadoop native 
> codec was specified as desired implementation, but the requisite native 
> support is somehow not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to