[
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil updated HBASE-27232:
-----------------------------------------
Release Note:
This changed behaviour of "hbase.writer.unified.encoded.blocksize.ratio"
property:
Previous behaviour: Checks if the encoded block size >=
("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) || (non-encoded
block size >= BLOCK_SIZE) when delimiting hfile blocks. As most often
(non-encoded block size >= BLOCK_SIZE) will be reached, setting
"hbase.writer.unified.encoded.blocksize.ratio" usually had no effect.
The default value for "hbase.writer.unified.encoded.blocksize.ratio" was "1".
New behaviour: If "hbase.writer.unified.encoded.blocksize.ratio" is set to
anything different from "0", it will check if encoded block size >=
("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) when delimiting
an hfile block. If "hbase.writer.unified.encoded.blocksize.ratio" is not set,
it will check if encoded block size >= BLOCK_SIZE || non-encoded block size >=
BLOCK_SIZE when delimiting an hfile block.
> Fix checking for encoded block size when deciding if block should be closed
> ---------------------------------------------------------------------------
>
> Key: HBASE-27232
> URL: https://issues.apache.org/jira/browse/HBASE-27232
> Project: HBase
> Issue Type: Improvement
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Fix For: 3.0.0-alpha-4
>
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and
> uncompressed data size when deciding to close a block and start a new one.
> That could lead to varying "on-disk" block sizes, depending on the encoding
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio
> property, as ration of the original configured block size, to be compared
> against the encoded size. This was an attempt to ensure homogeneous block
> sizes. However, the check introduced by HBASE-17757 also considers the
> unencoded size, which in the cases where encoding efficiency is higher than
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it
> will consider the unencoded size. This gives a finer control over the on-disk
> block sizes and the overall number of blocks when encoding is in use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)