[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790843#comment-17790843
 ] 

Kadir Ozdemir commented on HBASE-25972:
---------------------------------------

[~apurtell], [~vjasani], please see the design doc and PR and feel free to add 
reviewers for the PR. Please note this is my first HBase PR and any feedback 
will be greatly appreciated. Thanks

> Dual File Compactor
> -------------------
>
>                 Key: HBASE-25972
>                 URL: https://issues.apache.org/jira/browse/HBASE-25972
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when the rows has more than one version 
> since HBase stores all row versions sequentially in the same HFile after 
> compaction. However, applications (e.g., Phoenix) mostly query the most 
> recent row versions.
> Let us assume that the compaction generates two HFiles instead of one. One of 
> these files stores only the most recent cell versions. Let’s call this 
> single-version HFile. The other HFile stores all the previous cell versions. 
> Let’s call this multi-version HFile. The files that are generated by memstore 
> flushes will be of type multi version. The major and minor compaction 
> processes will generate single-version files as well as multi-version files. 
> This means for the queries on the most recent row versions, HBase does not 
> need to look into multi-version HFiles that are older than the latest 
> single-version HFiles.
> The blocks of single-version HFiles will be denser than the current HFiles in 
> general and this will improve the query times for most recent row version 
> queries. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to