[
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478679#comment-13478679
]
Hudson commented on HBASE-5987:
-------------------------------
Integrated in HBase-0.94 #539 (See
[https://builds.apache.org/job/HBase-0.94/539/])
HBASE-6032 Port HFileBlockIndex improvement from HBASE-5987 (Liyin, Ted,
Stack) (Revision 1399513)
Result = FAILURE
larsh :
Files :
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
*
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
*
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
*
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
*
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java
*
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java
> HFileBlockIndex improvement
> ---------------------------
>
> Key: HBASE-5987
> URL: https://issues.apache.org/jira/browse/HBASE-5987
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
> Fix For: 0.96.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--D3237.1.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.2.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.3.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.4.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.5.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.6.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.7.patch,
> ASF.LICENSE.NOT.GRANTED--D3237.8.patch,
> screen_shot_of_sequential_scan_profiling.png
>
>
> Recently we find out a performance problem that it is quite slow when
> multiple requests are reading the same block of data or index.
> From the profiling, one of the causes is the IdLock contention which has been
> addressed in HBASE-5898.
> Another issue is that the HFileScanner will keep asking the HFileBlockIndex
> about the data block location for each target key value during the scan
> process(reSeekTo), even though the target key value has already been in the
> current data block. This issue will cause certain index block very HOT,
> especially when it is a sequential scan.
> To solve this issue, we propose the following solutions:
> First, we propose to lookahead for one more block index so that the
> HFileScanner would know the start key value of next data block. So if the
> target key value for the scan(reSeekTo) is "smaller" than that start kv of
> next data block, it means the target key value has a very high possibility in
> the current data block (if not in current data block, then the start kv of
> next data block should be returned. +Indexing on the start key has some
> defects here+) and it shall NOT query the HFileBlockIndex in this case. On
> the contrary, if the target key value is "bigger", then it shall query the
> HFileBlockIndex. This improvement shall help to reduce the hotness of
> HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block
> Cache lookup.
> Secondary, we propose to push this idea a little further that the
> HFileBlockIndex shall index on the last key value of each data block instead
> of indexing on the start key value. The motivation is to solve the HBASE-4443
> issue (avoid seeking to "previous" block when key you are interested in is
> the first one of a block) as well as +the defects mentioned above+.
> For example, if the target key value is "smaller" than the start key value of
> the data block N. There is no way for sure the target key value is in the
> data block N or N-1. So it has to seek from data block N-1. However, if the
> block index is based on the last key value for each data block and the target
> key value is beween the last key value of data block N-1 and data block N,
> then the target key value is supposed be data block N for sure.
> As long as HBase only supports the forward scan, the last key value makes
> more sense to be indexed on than the start key value.
> Thanks Kannan and Mikhail for the insightful discussions and suggestions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira