[ 
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569403#comment-17569403
 ] 

Bryan Beaudreault commented on HBASE-27232:
-------------------------------------------

[~wchevreuil] this may be out of scope of your issue here, but it seems you 
have experience with this config. Looking back at HBASE-17757 there is some 
discussion around the feature, but no one could really say what metrics would 
help users know when to change this (and to what). Any chance you have some 
guidance on that based on your experience? It could be in a separate JIRA, but 
just using the opportunity of touching that feature here to bring this up.

> Fix checking for encoded block size when deciding if block should be closed
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-27232
>                 URL: https://issues.apache.org/jira/browse/HBASE-27232
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
> uncompressed data size when deciding to close a block and start a new one. 
> That could lead to varying "on-disk" block sizes, depending on the encoding 
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
> property, as ration of the original configured block size, to be compared 
> against the encoded size. This was an attempt to ensure homogeneous block 
> sizes. However, the check introduced by HBASE-17757 also considers the 
> unencoded size, which in the cases where encoding efficiency is higher than 
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if 
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
> will consider the unencoded size. This gives a finer control over the on-disk 
> block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to