[ 
https://issues.apache.org/jira/browse/HBASE-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425313#comment-17425313
 ] 

Andrew Kyle Purtell edited comment on HBASE-26316 at 10/7/21, 2:20 AM:
-----------------------------------------------------------------------

Confirmed functionality with a single host cluster:

{noformat}
hbase> create "IntegrationTestLoadCommonCrawl", \
    { NAME => 'c', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT 
=> 'ZSTD', BLOCKSIZE => 131072 }, \
    { NAME => 'i', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT 
=> 'ZSTD', BLOCKSIZE => 8192 }, \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '1' }
{noformat}

Loaded one WARC from common crawl.

Major compaction takes 11 seconds.

{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '10' }
{noformat}

Major compaction now takes 42 seconds. Total size on disk reduced by 11.6%.

{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '22' }
{noformat}

Major compaction now takes 17 minutes 15 seconds. Total size on disk reduced by 
13.3% vs level 1. (Sure, this level is crazy in practice.) 


was (Author: apurtell):
Confirmed functionality with a single host cluster:

{noformat}
hbase> create "IntegrationTestLoadCommonCrawl", \
    { NAME => 'c', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT 
=> 'ZSTD', BLOCKSIZE => 131072 }, \
    { NAME => 'i', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT 
=> 'ZSTD', BLOCKSIZE => 8192 }, \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '1' }
{noformat}

Loaded one WARC from common crawl.

Major compaction takes 11 seconds.

{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
    CONFIGURATION => {'hbase.io.compress.zstd.level' => '10' }
{noformat}

Major compaction now takes 42 seconds. Total size on disk reduced by 11.6%.

> Per-table or per-CF compression codec setting overrides
> -------------------------------------------------------
>
>                 Key: HBASE-26316
>                 URL: https://issues.apache.org/jira/browse/HBASE-26316
>             Project: HBase
>          Issue Type: Sub-task
>          Components: HFile, Operability
>    Affects Versions: 2.5.0, 3.0.0-alpha-2
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Minor
>             Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> This won't work as expected today...
> {noformat}
> hbase> create 'sometable', \
>   { NAME => 'somefamily', VERSIONS => 1000, COMPRESSION => 'ZSTD' }, \
>   CONFIGURATION => { 'hbase.io.compress.zstd.level' => '9' }
> {noformat}
> ... but it should. We get and retain Compressor instances in 
> HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit 
> when setting up the context, to update compression parameters like 
> compression level and buffer size per the ambient configuration, but we do 
> not plumb through the CompoundConfiguration from the Store into 
> HFileBlockDefaultEncodingContext. Instead can only update codec parameters 
> globally in system site conf files.
> This is actually pretty important for algorithms like ZSTD, which offers more 
> than 20 different compression levels, where at level 1 it is almost as fast 
> at compression as LZ4, and where at levels > 19 it utilizes computationally 
> expensive techniques to rival LZMA at compression ratio (and poor compression 
> speed). It is very likely that the ZSTD level you'd want to employ for a 
> given table's data will vary by use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to