[
https://issues.apache.org/jira/browse/HBASE-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425313#comment-17425313
]
Andrew Kyle Purtell edited comment on HBASE-26316 at 10/7/21, 2:20 AM:
-----------------------------------------------------------------------
Confirmed functionality with a single host cluster:
{noformat}
hbase> create "IntegrationTestLoadCommonCrawl", \
{ NAME => 'c', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT
=> 'ZSTD', BLOCKSIZE => 131072 }, \
{ NAME => 'i', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT
=> 'ZSTD', BLOCKSIZE => 8192 }, \
CONFIGURATION => { 'hbase.io.compress.zstd.level' => '1' }
{noformat}
Loaded one WARC from common crawl.
Major compaction takes 11 seconds.
{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
CONFIGURATION => { 'hbase.io.compress.zstd.level' => '10' }
{noformat}
Major compaction now takes 42 seconds. Total size on disk reduced by 11.6%.
{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
CONFIGURATION => { 'hbase.io.compress.zstd.level' => '22' }
{noformat}
Major compaction now takes 17 minutes 15 seconds. Total size on disk reduced by
13.3% vs level 1. (Sure, this level is crazy in practice.)
was (Author: apurtell):
Confirmed functionality with a single host cluster:
{noformat}
hbase> create "IntegrationTestLoadCommonCrawl", \
{ NAME => 'c', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT
=> 'ZSTD', BLOCKSIZE => 131072 }, \
{ NAME => 'i', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT
=> 'ZSTD', BLOCKSIZE => 8192 }, \
CONFIGURATION => { 'hbase.io.compress.zstd.level' => '1' }
{noformat}
Loaded one WARC from common crawl.
Major compaction takes 11 seconds.
{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
CONFIGURATION => {'hbase.io.compress.zstd.level' => '10' }
{noformat}
Major compaction now takes 42 seconds. Total size on disk reduced by 11.6%.
> Per-table or per-CF compression codec setting overrides
> -------------------------------------------------------
>
> Key: HBASE-26316
> URL: https://issues.apache.org/jira/browse/HBASE-26316
> Project: HBase
> Issue Type: Sub-task
> Components: HFile, Operability
> Affects Versions: 2.5.0, 3.0.0-alpha-2
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> This won't work as expected today...
> {noformat}
> hbase> create 'sometable', \
> { NAME => 'somefamily', VERSIONS => 1000, COMPRESSION => 'ZSTD' }, \
> CONFIGURATION => { 'hbase.io.compress.zstd.level' => '9' }
> {noformat}
> ... but it should. We get and retain Compressor instances in
> HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit
> when setting up the context, to update compression parameters like
> compression level and buffer size per the ambient configuration, but we do
> not plumb through the CompoundConfiguration from the Store into
> HFileBlockDefaultEncodingContext. Instead can only update codec parameters
> globally in system site conf files.
> This is actually pretty important for algorithms like ZSTD, which offers more
> than 20 different compression levels, where at level 1 it is almost as fast
> at compression as LZ4, and where at levels > 19 it utilizes computationally
> expensive techniques to rival LZMA at compression ratio (and poor compression
> speed). It is very likely that the ZSTD level you'd want to employ for a
> given table's data will vary by use case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)