[ 
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569503#comment-17569503
 ] 

Wellington Chevreuil commented on HBASE-27232:
----------------------------------------------

Hi [~bbeaudreault] , whilst experimenting with bucket cache, we kept monitoring 
the _blockCacheCount RegionServer_ metric, and comparing it to what would be 
the expected total number of blocks (roughly, dividing the table total store 
file size by 64kb, the default block size). When these two started to diverge 
too much (in one case we saw the number of blocks to be 10x higher then 
expected by the above calculation), we went inspect individual hfiles with the 
pretty printer tool. The "-h" option prints the blocks headers, which includes 
the "onDiskSizeWithoutHeader". Based on this "onDiskSizeWithoutHeader" value, 
we can have an idea of how much encoding is compressing the block data, then 
plan hbase.writer.unified.encoded.blocksize.ratio accordingly.

> Fix checking for encoded block size when deciding if block should be closed
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-27232
>                 URL: https://issues.apache.org/jira/browse/HBASE-27232
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
> uncompressed data size when deciding to close a block and start a new one. 
> That could lead to varying "on-disk" block sizes, depending on the encoding 
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
> property, as ration of the original configured block size, to be compared 
> against the encoded size. This was an attempt to ensure homogeneous block 
> sizes. However, the check introduced by HBASE-17757 also considers the 
> unencoded size, which in the cases where encoding efficiency is higher than 
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if 
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
> will consider the unencoded size. This gives a finer control over the on-disk 
> block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to