Kadir Ozdemir created HBASE-25972:
-------------------------------------
Summary: Single and Multi-version HFiles
Key: HBASE-25972
URL: https://issues.apache.org/jira/browse/HBASE-25972
Project: HBase
Issue Type: Improvement
Reporter: Kadir Ozdemir
HBase stores tables row by row in its files, HFiles. An HFile is composed of
blocks. The number of rows stored in a block depends on the row sizes. The
number of rows per block gets lower when the rows has more than one version
since HBase stores all row versions sequentially in the same HFile after
compaction. However, applications (e.g., Phoenix) mostly query the most recent
row versions.
Let us assume that the compaction generates two HFiles instead of one. One of
these files stores only the most recent cell versions. Let’s call this
single-version HFile. The other HFile stores all the previous cell versions.
Let’s call this multi-version HFile. The files that are generated by memstore
flushes will be of type multi version. The major and minor compaction processes
will generate single-version files as well as multi-version files. This means
for the queries on the most recent row versions, HBase does not need to look
into multi-version HFiles that are older than the latest single-version HFiles.
The blocks of single-version HFiles will be denser than the current HFiles in
general and this will improve the query times for most recent row version
queries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)