[
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570159#comment-17570159
]
Bryan Beaudreault commented on HBASE-27232:
-------------------------------------------
Thanks! One more question – how could this unified.encoded.blocksize actually
reduce the number of blocks? Doesn't it result in smaller blocks and thus more
blocks?
> Fix checking for encoded block size when deciding if block should be closed
> ---------------------------------------------------------------------------
>
> Key: HBASE-27232
> URL: https://issues.apache.org/jira/browse/HBASE-27232
> Project: HBase
> Issue Type: Improvement
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and
> uncompressed data size when deciding to close a block and start a new one.
> That could lead to varying "on-disk" block sizes, depending on the encoding
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio
> property, as ration of the original configured block size, to be compared
> against the encoded size. This was an attempt to ensure homogeneous block
> sizes. However, the check introduced by HBASE-17757 also considers the
> unencoded size, which in the cases where encoding efficiency is higher than
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it
> will consider the unencoded size. This gives a finer control over the on-disk
> block sizes and the overall number of blocks when encoding is in use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)