[ 
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-27232:
-----------------------------------------
    Release Note: 
This changed behaviour of "hbase.writer.unified.encoded.blocksize.ratio"  
property:

Previous behaviour: Checks if the encoded block size >= 
("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) || (non-encoded 
block size >= BLOCK_SIZE) when delimiting hfile blocks. As most often 
(non-encoded block size >= BLOCK_SIZE) will be reached, setting 
"hbase.writer.unified.encoded.blocksize.ratio" usually had no effect.
The default value for "hbase.writer.unified.encoded.blocksize.ratio" was "1".

New behaviour: If "hbase.writer.unified.encoded.blocksize.ratio" is set to 
anything different from "0", it will check if encoded block size >= 
("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) when delimiting 
an hfile block. If "hbase.writer.unified.encoded.blocksize.ratio" is not set, 
it will check if encoded block size >= BLOCK_SIZE || non-encoded block size >= 
BLOCK_SIZE when delimiting an hfile block. 

> Fix checking for encoded block size when deciding if block should be closed
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-27232
>                 URL: https://issues.apache.org/jira/browse/HBASE-27232
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>             Fix For: 3.0.0-alpha-4
>
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
> uncompressed data size when deciding to close a block and start a new one. 
> That could lead to varying "on-disk" block sizes, depending on the encoding 
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
> property, as ration of the original configured block size, to be compared 
> against the encoded size. This was an attempt to ensure homogeneous block 
> sizes. However, the check introduced by HBASE-17757 also considers the 
> unencoded size, which in the cases where encoding efficiency is higher than 
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if 
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
> will consider the unencoded size. This gives a finer control over the on-disk 
> block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to