Wellington Chevreuil created HBASE-27232:
--------------------------------------------

             Summary: Fix checking for encoded block size when deciding if 
block should be closed
                 Key: HBASE-27232
                 URL: https://issues.apache.org/jira/browse/HBASE-27232
             Project: HBase
          Issue Type: Improvement
            Reporter: Wellington Chevreuil
            Assignee: Wellington Chevreuil


On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
uncompressed data size when deciding to close a block and start a new one. That 
could lead to varying "on-disk" block sizes, depending on the encoding 
efficiency for the cells in each block.

HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
property, as ration of the original configured block size, to be compared 
against the encoded size. This was an attempt to ensure homogeneous block 
sizes. However, the check introduced by HBASE-17757 also considers the 
unencoded size, which in the cases where encoding efficiency is higher than 
what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
still lead to varying block sizes.

This patch changes that logic, to only consider encoded size if 
hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
will consider the unencoded size. This gives a finer control over the on-disk 
block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to