[ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221703#comment-17221703
 ] 

Andrew Kyle Purtell commented on HBASE-24637:
---------------------------------------------

[~ram_krish] We definitely do reseek. I have instrumented the reseek() call. In 
1.x it is called one time per store file in the region, in 2.x it is called 
many times, proportional to the something other than number of store files in 
the region that increases as the number of cells to scan increases. See below 
table. That is basically an increase in store reseek activity by +∞. Now, maybe 
the reseek doesn't actually do IO, but wall clock time measured by 
store_reseek_ms increases, so that's real work on the CPU that doesn't happen 
at all in 1.x.

 
||hbase_version||columns_in_test_case||seeker_next||store_reseek||store_reseek_ms||
|1|1|10000000|1|2.71|
|2|1|10000000|11733|62.17|
|1|5|50000000|2|10.93|
|2|5|50000000|59924|233.67|
|1|10|100000000|8|10.88|
|2|10|100000000|120607|401.88|
|1|100|1000000000|8|24.32|
|2|100|1000000000|1163490|4065.25|

 
This reseek time matches the magnitude of the degradation.

Regarding your point about comparisons, the metrics I have collected indicate 
that is very likely. For example, we don't get SKIP the version tracker becomes 
involved, and given a scan of 50M rows, the version tracker does 50M more 
comparisons in 2.x than it would in 1.x (where it would do 0 comparisons).

> Reseek regression related to filter SKIP hinting
> ------------------------------------------------
>
>                 Key: HBASE-24637
>                 URL: https://issues.apache.org/jira/browse/HBASE-24637
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters, Performance, Scanners
>    Affects Versions: 2.2.5
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to