[ 
https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573801#comment-17573801
 ] 

Wellington Chevreuil commented on HBASE-27264:
----------------------------------------------

{quote}
Can we have a single unified config for them?
{quote}
 Such "unified" behaviour would already be achieved by the 
"hbase.block.size.limit.compressed" and  "hbase.block.size.max.compressed". 
Because we compress the block after encoding, so even when encoding is on, we 
really are checking the encoded and compressed size here.

The original goal of unified.encoded.blocksize.ratio property was to give 
consistent block sizes in order to avoid fragmentation in the bucket cache. It 
had a bug, though, where if the encoded compression efficiency was higher than 
the configured unified.encoded.blocksize.ratio value, we would still se varying 
block sizes and fragmentation. HBASE-27232 fixed this problem. Now, with the 
fix, if the ratio is set to 1, we will have blocks of the actual encoded size 
(which was not possible before because of the bug).

So keeping these separately gives the ability for choosing at which level we 
want to delimit blocks, with the extra control over fragmentation in the case 
of encoding only.

> Add options to consider compressed size when delimiting blocks during hfile 
> writes
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-27264
>                 URL: https://issues.apache.org/jira/browse/HBASE-27264
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
> property soo that it can allow for the encoded size to be considered when 
> delimiting hfiles blocks during writes.
> Here we propose two additional properties,"hbase.block.size.limit.compressed" 
> and  "hbase.block.size.max.compressed" that would allow for consider the 
> compressed size (if compression is in use) for delimiting blocks during hfile 
> writing. When compression is enabled, certain datasets can have very high 
> compression efficiency, so that the default 64KB block size and 10GB max file 
> size can lead to hfiles with very large number of blocks. 
> In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
> switches to compressed size for delimiting blocks, and 
> "hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
> compressed block size, in order to avoid very large uncompressed blocks 
> (defaulting to 320KB).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to