[
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir Ozdemir reassigned HBASE-25972:
-------------------------------------
Assignee: Kadir Ozdemir
> Single and Multi-version HFiles
> -------------------------------
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Assignee: Kadir Ozdemir
> Priority: Major
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of
> blocks. The number of rows stored in a block depends on the row sizes. The
> number of rows per block gets lower when the rows has more than one version
> since HBase stores all row versions sequentially in the same HFile after
> compaction. However, applications (e.g., Phoenix) mostly query the most
> recent row versions.
> Let us assume that the compaction generates two HFiles instead of one. One of
> these files stores only the most recent cell versions. Let’s call this
> single-version HFile. The other HFile stores all the previous cell versions.
> Let’s call this multi-version HFile. The files that are generated by memstore
> flushes will be of type multi version. The major and minor compaction
> processes will generate single-version files as well as multi-version files.
> This means for the queries on the most recent row versions, HBase does not
> need to look into multi-version HFiles that are older than the latest
> single-version HFiles.
> The blocks of single-version HFiles will be denser than the current HFiles in
> general and this will improve the query times for most recent row version
> queries.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)