[jira] [Commented] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1

Anoop Sam John (Jira) Fri, 24 Jan 2020 05:42:06 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022942#comment-17022942
 ]


Anoop Sam John commented on HBASE-23279:
----------------------------------------

Here the fail report says
bq.expected:<DATA> but was:<ENCODED_DATA>
So the offset/length issue u r seeing after fixing this assert problem?

On enabling this default for all the tables,  should we really do? 
We had a concern initially that whether this will increase each of the block's 
size as we have to write this offset meta also.  If that happens it might 
affect with our Bucket cache bucket sizes.  But later thought it might not 
because we break the Hfile into blocks based on encoded data size or un encoded 
data size written. Whichever reaches the block size limit 1st, we treat that as 
a block and start new block.  So the thinking was each of the block's size wont 
be changed but as now each of the block might contain less actual data cells, 
the number of blocks might get increased. Thought that is ok.
But seems that thinking was wrong !   Because in case of this Encoder, the 
encoded data size written so far tracking (on the go while writing cells to 
block) will be same as the unencoded data size only. Because we write only the 
cells. The extra info what this Encoder will write is initially kept in a BAOS. 
(The row offsets). When the block is finished, (after we came to know that the 
size written so far reached the block size limit) then only the BAOS content is 
written to the closing block. Means we have some delta size getting written to 
the block which already reached the size limit. This extra offsets size info is 
NOT tracked during the writes to this block !
IMHO we should not turn to this encoding by default.

> Switch default block encoding to ROW_INDEX_V1
> ---------------------------------------------
>
>                 Key: HBASE-23279
>                 URL: https://issues.apache.org/jira/browse/HBASE-23279
>             Project: HBase
>          Issue Type: Wish
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Lars Hofhansl
>            Assignee: Viraj Jasani
>            Priority: Minor
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: HBASE-23279.master.000.patch, 
> HBASE-23279.master.001.patch, HBASE-23279.master.002.patch, 
> HBASE-23279.master.003.patch, HBASE-23279.master.004.patch
>
>
> Currently we set both block encoding and compression to NONE.
> ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles 
> are slightly larger about 3% or so). I think that would a better default than 
> NONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1

Reply via email to