[
https://issues.apache.org/jira/browse/HBASE-26259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422934#comment-17422934
]
Andrew Kyle Purtell edited comment on HBASE-26259 at 9/30/21, 5:50 PM:
-----------------------------------------------------------------------
*ITLCC Compression Test*
s3n://commoncrawl/crawl-data/CC-MAIN-2021-10/segments/1614178347293.1/warc/CC-MAIN-20210224165708-20210224195708-00000.warc.gz
(310,856 rows)
The aircompressor pure Java compressors are competitive with JNI integrations
and provide reasonable fallback options when native support e.g. dynamic link
libraries cannot be made available.
The fastest option overall is lz4-java. It offers both excellent compression
and decompression performance and a reasonable compression ratio. Snappy
options are also very fast but slightly less effective.
Note: A deflate codec implementation is not proposed as part of this
contribution. It is just a simple integration of the Java runtime environment's
java.util.zip.Deflater and java.util.zip.Inflater offered as a baseline
comparison.
||Codec||Level||Total compacted size||Compression||Major compaction time
(seconds)||
| | | | | |
|None|-|5,121,804,020|-|-|
| | | | | |
|lz4 (aircompressor)|-|1,525,277,027|70.2%|16|
|lz4 (lz4-java)|-|1,524,932,289|70.2%|13|
|lzo (aircompressor)|-|1,578,220,788|69.2%|15|
|Snappy (aircompressor)|-|1,595,784,200|68.8%|14|
|Snappy (xerial)|-|1,604,659,551|68.7%|17|
|ZStandard (aircompressor)|(dfast)|1,074,637,595|79.0%|32|
|ZStandard (zstd-jni)|1 (fast min)|1,111,058,386|78.3%|18|
|ZStandard (zstd-jni)|2 (fast max)|1,076,837,323|79.0%|20|
|ZStandard (zstd-jni)|4 (dfast max)|1,034,994,492|79.8%|33|
|ZStandard (zstd-jni)|5 (greedy)|1,021,233,357|80.1%|38|
|ZStandard (zstd-jni)|6 (lazy)|1,004,341,613|80.4%|50|
|ZStandard (zstd-jni)|12 (lazy2 max)|975,070,595|81.0%|225|
|ZStandard (zstd-jni)|15 (btlazy2 max)|909,137,633|82.2%|446|
|ZStandard (zstd-jni)|18 (btopt max)|905,353,364|82.3%|1214|
|ZStandard (zstd-jni)|22 (btultra max)|903,207,510|82.4%|2406|
|LMZA|1 (min)|972,730,691|81.0%|201|
|LZMA|3 (default)|955,130,988|81.4%|264|
|LZMA|6|911,930,048|82.2%|1260|
|LZMA|9 (max)|911,929,521|82.2%|2124|
| | | | | |
|Deflate (java.util.zip)|1 (min)|1,227,907,255|76.0%|43|
|Deflate (java.util.zip)|6 (default)|1,059,481,372|79.3%|89|
|Deflate (java.util.zip)|9 (max)|1,013,628,808|80.2%|147|
was (Author: apurtell):
*ITLCC Compression Test*
s3n://commoncrawl/crawl-data/CC-MAIN-2021-10/segments/1614178347293.1/warc/CC-MAIN-20210224165708-20210224195708-00000.warc.gz
(310,856 rows)
Note: A deflate codec implementation is not proposed as part of this
contribution. It is just a simple integration of the Java runtime environment's
java.util.zip.Deflater and java.util.zip.Inflater offered as a baseline
comparison.
||Codec||Level||Total compacted size||Compression||Major compaction time
(seconds)||
| | | | | |
|None|-|5,121,804,020|-|-|
| | | | | |
|lz4 (aircompressor)|-|1,525,277,027|70.2%|16|
|lz4 (lz4-java)|-|1,524,932,289|70.2%|13|
|lzo (aircompressor)|-|1,578,220,788|69.2%|15|
|Snappy (aircompressor)|-|1,595,784,200|68.8%|14|
|Snappy (xerial)|-|1,604,659,551|68.7%|17|
|ZStandard (aircompressor)|(dfast)|1,074,637,595|79.0%|32|
|ZStandard (zstd-jni)|1 (fast min)|1,111,058,386|78.3%|18|
|ZStandard (zstd-jni)|2 (fast max)|1,076,837,323|79.0%|20|
|ZStandard (zstd-jni)|4 (dfast max)|1,034,994,492|79.8%|33|
|ZStandard (zstd-jni)|5 (greedy)|1,021,233,357|80.1%|38|
|ZStandard (zstd-jni)|6 (lazy)|1,004,341,613|80.4%|50|
|ZStandard (zstd-jni)|12 (lazy2 max)|975,070,595|81.0%|225|
|ZStandard (zstd-jni)|15 (btlazy2 max)|909,137,633|82.2%|446|
|ZStandard (zstd-jni)|18 (btopt max)|905,353,364|82.3%|1214|
|ZStandard (zstd-jni)|22 (btultra max)|903,207,510|82.4%|2406|
|LMZA|1 (min)|972,730,691|81.0%|201|
|LZMA|3 (default)|955,130,988|81.4%|264|
|LZMA|6|911,930,048|82.2%|1260|
|LZMA|9 (max)|911,929,521|82.2%|2124|
| | | | | |
|Deflate (java.util.zip)|1 (min)|1,227,907,255|76.0%|43|
|Deflate (java.util.zip)|6 (default)|1,059,481,372|79.3%|89|
|Deflate (java.util.zip)|9 (max)|1,013,628,808|80.2%|147|
> Fallback support to pure Java compression
> -----------------------------------------
>
> Key: HBASE-26259
> URL: https://issues.apache.org/jira/browse/HBASE-26259
> Project: HBase
> Issue Type: Sub-task
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
> Attachments: BenchmarkCodec.java, BenchmarksMain.java,
> RandomDistribution.java, ac_lz4_results.pdf, ac_snappy_results.pdf,
> ac_zstd_results.pdf, lz4_lz4-java_result.pdf, xerial_snappy_results.pdf
>
>
> Airlift’s aircompressor
> (https://search.maven.org/artifact/io.airlift/aircompressor) is an Apache 2
> licensed library, for Java 8 and up, available in Maven central, which
> provides pure Java implementations of gzip, lz4, lzo, snappy, and zstd and
> Hadoop compression codecs for same, claiming “_they are typically 300% faster
> than the JNI wrappers_.” (https://github.com/airlift/aircompressor). This
> library is under active development and up to date releases because it is
> used by Trino.
> Proposed changes:
> * Modify Compression.java such that compression codec implementation classes
> can be specified by configuration. Currently they are hardcoded as strings.
> * Pull in aircompressor as a ‘compile’ time dependency so it will be bundled
> into our build and made available on the server classpath.
> * Modify Compression.java to fall back to an aircompressor pure Java
> implementation if schema specifies a compression algorithm, a Hadoop native
> codec was specified as desired implementation, but the requisite native
> support is somehow not available.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)