Wellington Chevreuil created HBASE-27232:
--------------------------------------------
Summary: Fix checking for encoded block size when deciding if
block should be closed
Key: HBASE-27232
URL: https://issues.apache.org/jira/browse/HBASE-27232
Project: HBase
Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil
On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and
uncompressed data size when deciding to close a block and start a new one. That
could lead to varying "on-disk" block sizes, depending on the encoding
efficiency for the cells in each block.
HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio
property, as ration of the original configured block size, to be compared
against the encoded size. This was an attempt to ensure homogeneous block
sizes. However, the check introduced by HBASE-17757 also considers the
unencoded size, which in the cases where encoding efficiency is higher than
what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would
still lead to varying block sizes.
This patch changes that logic, to only consider encoded size if
hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it
will consider the unencoded size. This gives a finer control over the on-disk
block sizes and the overall number of blocks when encoding is in use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)