Kadir Ozdemir created HBASE-25972:
-------------------------------------

             Summary: Single and Multi-version HFiles
                 Key: HBASE-25972
                 URL: https://issues.apache.org/jira/browse/HBASE-25972
             Project: HBase
          Issue Type: Improvement
            Reporter: Kadir Ozdemir


HBase stores tables row by row in its files, HFiles. An HFile is composed of 
blocks. The number of rows stored in a block depends on the row sizes. The 
number of rows per block gets lower when the rows has more than one version 
since HBase stores all row versions sequentially in the same HFile after 
compaction. However, applications (e.g., Phoenix) mostly query the most recent 
row versions.

Let us assume that the compaction generates two HFiles instead of one. One of 
these files stores only the most recent cell versions. Let’s call this 
single-version HFile. The other HFile stores all the previous cell versions. 
Let’s call this multi-version HFile. The files that are generated by memstore 
flushes will be of type multi version. The major and minor compaction processes 
will generate single-version files as well as multi-version files. This means 
for the queries on the most recent row versions, HBase does not need to look 
into multi-version HFiles that are older than the latest single-version HFiles.

The blocks of single-version HFiles will be denser than the current HFiles in 
general and this will improve the query times for most recent row version 
queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to