[ 
https://issues.apache.org/jira/browse/HBASE-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776008#comment-17776008
 ] 

Becker Ewing commented on HBASE-28043:
--------------------------------------

After some offline discussion with [~bbeaudreault], I trialed a "lazy 
optimization" path that kicks in on the 2nd consecutive call to 
"seekToPreviousRow" as opposed to the 1st: 
[https://gist.github.com/jbewing/e721a1f1f2615591512385f4b37da496] 

 

I also decided to benchmark _everything_ a lot more as a single benchmark run 
gives some signal, but not a super clear signal in our case. I repeated the 
above benchmarking methodology (and have included run #1 for "master" and 
"patch" in this table) along with "patch w/ lazy opt" which is the "lazy 
optimization" patch mentioned above.

 

I got the following results:

 
||Benchmark||Revision||Avg Latency (us)||Avg Throughput (rows / sec)||
|metaRandomRead #1|master|839|11898|
|metaRandomRead #2|master|789|12656|
|metaRandomRead #3|master|785|12714|
|metaRandomRead #1|patch|891|11203|
|metaRandomRead #2|patch|843|11840|
|metaRandomRead #3|patch|894|11181|
|metaRandomRead #1|patch w/ lazy opt|896|11134|
|metaRandomRead #2|patch w/ lazy opt|845|11825|
|metaRandomRead #3|patch w/ lazy opt|815|12239|
|metaRandomRead #4|patch w/ lazy opt|850|11749|

 

If anyone is interested in looking at per-thread results/response time 
histograms, I've pasted the raw results output logs of "hbase pe 
--nomapred=true metaRandomRead 10" for the newer above results (#2-3 master, 
#2-3 patch, #1-4 patch w/ lazy opt) in [this 
gist|https://gist.github.com/jbewing/185d7f45d1f43e4f1ee87a45fb3206bf].

 

I'm not entirely sure what to make of this. Both patches seem to be roughly 
equal in terms of benchmarking with master being about 50 microseconds faster 
on average. Given that 50 microseconds is a fraction of a datacenter RTT, maybe 
the regression won't be that apparent. On the other hand, I'm not sure that I 
can even say with too much authority that master + the patch (and variation) 
differ only be 50 microseconds as the benchmarking numbers are all over the 
place.

> Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow
> --------------------------------------------------------------------------
>
>                 Key: HBASE-28043
>                 URL: https://issues.apache.org/jira/browse/HBASE-28043
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Becker Ewing
>            Assignee: Becker Ewing
>            Priority: Major
>         Attachments: Current_SeekToPreviousRowBehavior.png, 
> Proposed_SeekToPreviousRowBehavior.png
>
>
> Currently, for non-RIV1 DBE encodings, each call to 
> [StoreFileScanner.seekToPreviousRow|https://github.com/apache/hbase/blob/89ca7f4ade84c84a246281c71898543b6161c099/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L493-L506]
>  (a common operation in reverse scans) results in two seeks: 
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to the first cell of the prior row
> So if there are "N" rows in a block, a reverse scan through each row results 
> in seeking past 2(N-1)! rows.
>  
> This is a particularly expensive operation for tall tables that have many 
> rows in a block.
>  
> By introducing a state variable "previousRow" to StoreFileScanner, I believe 
> that we could modify the seeking algorithm to be:
>  # Seek from the beginning of the block to before the given row to find the 
> prior row
>  # Seek from the beginning of the block to before the row that is before the 
> row that was just seeked to (i.e. 2 rows back). _Save_ this as a hint for 
> where the prior row is in "previousRow"
>  # Reseek from "previousRow" (2 rows back from start) to 1 row back from 
> start (to the actual previousRow)
> Then the rest of the calls where a "previousRow" is present, you just need to 
> seek to the beginning of the block once instead of twice, i.e. 
>  # seek from the beginning of the block to right before the beginning of your 
> "previousRow" marker. Save this as the new "previousRow" marker
>  # Reseek to the next row (i.e. your previous "previousRow" marker)
>  
> If there are "N" rows in a block, a reverse scan from row N to row 0 results 
> in seeking past approximately (N-1)! rows i.e. 50% less than the current 
> behavior.
>  
> See the attached diagrams for the current and proposed behavior. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to