[ 
https://issues.apache.org/jira/browse/HBASE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779483#comment-13779483
 ] 

Jonathan Hsieh commented on HBASE-9583:
---------------------------------------

HFiles contain many blocks that contain a range of sorted Cells.  Each cell has 
a key.  To save IO when reading Cells, the HFile also has an index that maps a 
Cell's start key to the offset of the beginning of a particular block.  Prior 
to this optimization, HBase would use the key of the first cell in each data 
block as the index key.  

In HBASE-7845, we generate a new key that is lexicographically larger than the 
last key of the previous block and lexicographically equal or smaller than the 
start key of the current block.  While actual keys can potentially be very 
long, this "fake key" or "virtual key" can be much shorter.  For example, if 
the stop key of previous block is "the quick brown fox", the start key of 
current block is "the who", we could use "the r" as our virtual key in our 
hfile index. 

There are two benefits to this: 
  1) having shorter keys reduces the hfile index size, (allowing us to keep 
more indexes in memory), and 
  2) using something closer to the end key of the previous block allows us to 
avoid a potential extra IO when the target key lives in between the "virtual 
key" and the key of the first element in the target block.

This optimization (implemented by the getShortMidpointKey method) is inspired 
by LevelDB's ByteWiseComparatorImpl::FindShortestSeparator() and 
FindShortSuccessor().  
                
> add document for getShortMidpointKey
> ------------------------------------
>
>                 Key: HBASE-9583
>                 URL: https://issues.apache.org/jira/browse/HBASE-9583
>             Project: HBase
>          Issue Type: Task
>          Components: HFile
>    Affects Versions: 0.98.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HBase-9583.txt, HBase-9583-v2.txt
>
>
> add the faked key to documentation http://hbase.apache.org/book.html#hfilev2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to