[ 
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276211#comment-13276211
 ] 

Phabricator commented on HBASE-5987:
------------------------------------

tedyu has commented on the revision "[jira][89-fb] [HBASE-5987] HFileBlockIndex 
improvement".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:411 'is to 
keep' -> 'keeps'
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:415 'it 
means it' -> 'it means that'
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:205 
Please add javadoc for the last three parameters
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:208 Can 
this method be named getDataBlockInfo() ?
  For 'seekTo', I think DataBlock would be the target, not DataBlockInfo.
  See comment below w.r.t. naming of DataBlockInfo
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:196 
'other attributes' -> 'additional attributes' ?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:293 'Only 
' can be removed.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockInfo.java:2 No year, 
please.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:306 Can 
we use builder pattern to fill out nextIndexedKey ?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockInfo.java:26 Would 
HFileBlockWithInfo be a better name ?
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:480 Should 
this be '< 0' ?
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:2 
Please remove year.
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:44 
Please add test category.

REVISION DETAIL
  https://reviews.facebook.net/D3237

To: Kannan, mbautin, Liyin
Cc: JIRA, todd, tedyu

                
> HFileBlockIndex improvement
> ---------------------------
>
>                 Key: HBASE-5987
>                 URL: https://issues.apache.org/jira/browse/HBASE-5987
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D3237.1.patch, 
> screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when 
> multiple requests are reading the same block of data or index. 
> From the profiling, one of the causes is the IdLock contention which has been 
> addressed in HBASE-5898. 
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex 
> about the data block location for each target key value during the scan 
> process(reSeekTo), even though the target key value has already been in the 
> current data block. This issue will cause certain index block very HOT, 
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the 
> HFileScanner would know the start key value of next data block. So if the 
> target key value for the scan(reSeekTo) is "smaller" than that start kv of 
> next data block, it means the target key value has a very high possibility in 
> the current data block (if not in current data block, then the start kv of 
> next data block should be returned. +Indexing on the start key has some 
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
> the contrary, if the target key value is "bigger", then it shall query the 
> HFileBlockIndex. This improvement shall help to reduce the hotness of 
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
> Cache lookup.
> Secondary, we propose to push this idea a little further that the 
> HFileBlockIndex shall index on the last key value of each data block instead 
> of indexing on the start key value. The motivation is to solve the HBASE-4443 
> issue (avoid seeking to "previous" block when key you are interested in is 
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of 
> the data block N. There is no way for sure the target key value is in the 
> data block N or N-1. So it has to seek from data block N-1. However, if the 
> block index is based on the last key value for each data block and the target 
> key value is beween the last key value of data block N-1 and data block N, 
> then the target key value is supposed be data block N for sure. 
> As long as HBase only supports the forward scan, the last key value makes 
> more sense to be indexed on than the start key value. 
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to