[
https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671750#comment-16671750
]
Ted Yu commented on HBASE-21418:
--------------------------------
For the new test, I ran it without the rest of the patch:
{code}
Running org.apache.hadoop.hbase.client.TestLookAheadBeforeReseek
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.647 sec - in
org.apache.hadoop.hbase.client.TestLookAheadBeforeReseek
{code}
What is TestLookAheadBeforeReseek supposed to show without the fix ?
> Reduce a number of reseek operations in MemstoreScanner when seek point is
> close to the current row.
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-21418
> URL: https://issues.apache.org/jira/browse/HBASE-21418
> Project: HBase
> Issue Type: Improvement
> Components: scan, Scanners
> Affects Versions: 1.2.5
> Reporter: Jeongdae Kim
> Assignee: Jeongdae Kim
> Priority: Minor
> Labels: performance
> Attachments: HBASE-21418.branch-1.2.001.patch
>
>
> We observed “responseTooSlow” logs for Get requests in our production
> clusters. even some get requests were responded after 10 seconds.
> Affected get requests were done with the timerange, and target rows have many
> columns that have some versions.
> We reproduced this issue, and found this behavior happens only when scanning
> in the memstore. after flushing the HStore, this slow response issue for Get
> disappeared and all same get requests are responded very quickly.
>
> We investigated this case, and found this performance difference between
> memstore scanner and hfile scanner is caused by the number of reseek
> operations executed while scanning. When a store scanner needs to reseek the
> next column, Hfile scanner wisely decide whether it have to reseek or not by
> checking the seek point is in current block, whereas memstore scanner just do
> reseek without decision unlike Hfile scanner. In our case, almost all columns
> in the memstore have older timestamp than scan(get)’s timerange, and so many
> reseek operations occur as much as about the number of columns. This results
> in increasing the response time of Get requests sporadically.
>
> To improve the reseek operation of the memstore scanner, i think it’s better
> skipping than seeking when reseek requested, if seek point is quite close to
> current cell that the scanner is pointing now.(Actually, i changed
> MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time
> of Get was 6x faster than before) But we can’t decide whether seek point is
> close to the current cell or not, because memstore scannner has no
> information such as next block index.
> Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this
> case, and it may be deprecated someday. But, i think that hint is still be
> useful for the memstore scanner to try to skip first, before reseeking, and
> with this option we can make reseek operations of memstore scanner smarter.
>
> I tested this patch in our case, and got the same result as i changed
> matchcode (mentioned above).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)