[ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145964#comment-17145964
 ] 

Andrew Kyle Purtell edited comment on HBASE-24637 at 6/26/20, 1:47 AM:
-----------------------------------------------------------------------

If you'd like to play around with the instrumentation, see attached patches, 
for branch-1 and branch-2.2, respectively. After applying them add this to 
log4j.properties:

{noformat}
log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE
{noformat}

Here's an example output from a scan of {{hbase:meta}}:

{noformat}
2020-06-24 14:45:13,870 TRACE 
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] 
ipc.CallRunner: callId: 0 service: ClientService methodName: Scan size: 102
connection: 10.55.111.78:54551 deadline: 1593035713822 successful: true
metrics: [ "block_read_keys": 477 "block_read_ns": 3427040
 "block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169
 "block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165
 "cells_matched__hbase:meta,,1.1588230740__info": 165
 "column_hint_include": 148 "memstore_next": 72
 "memstore_next_ns": 136671 "memstore_seek": 2
 "memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17
 "sqm_hint_include": 74 "sqm_hint_seek_next_col": 74
 "store_next": 276
 "store_next__1c930a35ff8041368a05817adbdcce97": 40
 "store_next__2644194fdf794815abdc940c183dab88": 40
 "store_next__32ce31753fb244668f788fb94ab02dff": 40
 "store_next__61c8423b9d8846c99a61cd2996b5b621": 116
 "store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40
 "store_next_ns": 1891634
 "store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383
 "store_next_ns__2644194fdf794815abdc940c183dab88": 299936
 "store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594
 "store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313
 "store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408
 "store_reseek": 164
 "store_reseek__1c930a35ff8041368a05817adbdcce97": 32
 "store_reseek__2644194fdf794815abdc940c183dab88": 32
 "store_reseek__32ce31753fb244668f788fb94ab02dff": 32
 "store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36
 "store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32
 "store_reseek_ns": 2969978
 "store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489
 "store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115
 "store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642
 "store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188
 "store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544
 "store_seek": 5
 "store_seek__1c930a35ff8041368a05817adbdcce97": 1
 "store_seek__2644194fdf794815abdc940c183dab88": 1
 "store_seek__32ce31753fb244668f788fb94ab02dff": 1
 "store_seek__61c8423b9d8846c99a61cd2996b5b621": 1
 "store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1
 "store_seek_ns": 8862786
 "store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421
 "store_seek_ns__2644194fdf794815abdc940c183dab88": 585899
 "store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605
 "store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072
 "store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789
 "versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
{noformat}

You can use the attached perl script to aggregate all CallRunner trace logging 
for a given table. Here's an example from the PE --filterAll case over the 100 
column table on HBase 2.2:

{noformat}
COUNT: 72
block_read_ns 16022572976
block_reads 1157963
block_unpack_ns 40930352
block_unpacks 1157963
cached_block_read_ns 191684215
cached_block_reads 1157330
cells_matched 1010000000
cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0
 5459454
cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0
 39100332
cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0
 101404202
cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0
 130169002
cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0
 124239494
cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0
 139113562
cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0
 147105288
cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0
 323408666
column_hint_include 1000000000
filter_hint_skip 1000000000
reseeks 1163490
seeker_next 1000000000
seeker_next_ns 92121300195
sqm_hint_done 9999992
sqm_hint_seek_next_col 990000008
sqm_hint_seek_next_row 10000000
store_next 998836518
store_next__03573d0d16474c73a315c5d2a1f25986 145479362
store_next__0d570a284b274c7686ebbaacc1786110 128730258
store_next__622d9f19372d478aadb7112dad739106 137576008
store_next__83be76bc01a74b9e9831c743e25b73b8 319835506
store_next__a74e26cbc87741689f71f44dca266c85 5398444
store_next__d43b019ef3b24abea32261fc5af2b59b 100283189
store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173
store_next__ef169c9106cd4695ba066e98e26a5367 38667578
store_next_ns 227944975778
store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224
store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635
store_next_ns__622d9f19372d478aadb7112dad739106 31970878120
store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595
store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596
store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458
store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445
store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705
store_reseek 1163490
store_reseek__03573d0d16474c73a315c5d2a1f25986 169439
store_reseek__0d570a284b274c7686ebbaacc1786110 149943
store_reseek__622d9f19372d478aadb7112dad739106 160193
store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095
store_reseek__a74e26cbc87741689f71f44dca266c85 6957
store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012
store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228
store_reseek__ef169c9106cd4695ba066e98e26a5367 45623
store_reseek_ns 4065247883
store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475
store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515
store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308
store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574
store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452
store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215
store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682
store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662
versions_hint_include_and_seek_next_col 990000000
versions_hint_include_and_seek_next_row 10000000
{noformat}


was (Author: apurtell):
If you'd like to play around with the instrumentation, see attached patches, 
for branch-1 and branch-2.2, respectively. After applying them add this to 
log4j.properties:

{noformat}
log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE
{noformat}

Here's an example output from a scan of {{hbase:meta}}:

{noformat}
2020-06-24 14:45:13,870 TRACE 
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] ipc.CallRunner: 
callId: 0 service: ClientService methodName: Scan size: 102 connection: 
10.55.111.78:54551 deadline: 1593035713822 successful: true metrics: [ 
"block_read_keys": 477 "block_read_ns": 3427040 "block_reads": 13 
"block_seek_ns": 1606370 "block_seeks": 169 "block_unpack_ns": 10256 
"block_unpacks": 13 "cells_matched": 165 
"cells_matched__hbase:meta,,1.1588230740__info": 165 "column_hint_include": 148 
"memstore_next": 72 "memstore_next_ns": 136671 "memstore_seek": 2 
"memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17 
"sqm_hint_include": 74 "sqm_hint_seek_next_col": 74 "store_next": 276 
"store_next__1c930a35ff8041368a05817adbdcce97": 40 
"store_next__2644194fdf794815abdc940c183dab88": 40 
"store_next__32ce31753fb244668f788fb94ab02dff": 40 
"store_next__61c8423b9d8846c99a61cd2996b5b621": 116 
"store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40 "store_next_ns": 1891634 
"store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383 
"store_next_ns__2644194fdf794815abdc940c183dab88": 299936 
"store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594 
"store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313 
"store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408 "store_reseek": 164 
"store_reseek__1c930a35ff8041368a05817adbdcce97": 32 
"store_reseek__2644194fdf794815abdc940c183dab88": 32 
"store_reseek__32ce31753fb244668f788fb94ab02dff": 32 
"store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36 
"store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32 "store_reseek_ns": 2969978 
"store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489 
"store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115 
"store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642 
"store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188 
"store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544 "store_seek": 5 
"store_seek__1c930a35ff8041368a05817adbdcce97": 1 
"store_seek__2644194fdf794815abdc940c183dab88": 1 
"store_seek__32ce31753fb244668f788fb94ab02dff": 1 
"store_seek__61c8423b9d8846c99a61cd2996b5b621": 1 
"store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1 "store_seek_ns": 8862786 
"store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421 
"store_seek_ns__2644194fdf794815abdc940c183dab88": 585899 
"store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605 
"store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072 
"store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789 
"versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
{noformat}

You can use the attached perl script to aggregate all CallRunner trace logging 
for a given table. Here's an example from the PE --filterAll case over the 100 
column table on HBase 2.2:

{noformat}
COUNT: 72
block_read_ns 16022572976
block_reads 1157963
block_unpack_ns 40930352
block_unpacks 1157963
cached_block_read_ns 191684215
cached_block_reads 1157330
cells_matched 1010000000
cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0
 5459454
cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0
 39100332
cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0
 101404202
cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0
 130169002
cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0
 124239494
cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0
 139113562
cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0
 147105288
cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0
 323408666
column_hint_include 1000000000
filter_hint_skip 1000000000
reseeks 1163490
seeker_next 1000000000
seeker_next_ns 92121300195
sqm_hint_done 9999992
sqm_hint_seek_next_col 990000008
sqm_hint_seek_next_row 10000000
store_next 998836518
store_next__03573d0d16474c73a315c5d2a1f25986 145479362
store_next__0d570a284b274c7686ebbaacc1786110 128730258
store_next__622d9f19372d478aadb7112dad739106 137576008
store_next__83be76bc01a74b9e9831c743e25b73b8 319835506
store_next__a74e26cbc87741689f71f44dca266c85 5398444
store_next__d43b019ef3b24abea32261fc5af2b59b 100283189
store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173
store_next__ef169c9106cd4695ba066e98e26a5367 38667578
store_next_ns 227944975778
store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224
store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635
store_next_ns__622d9f19372d478aadb7112dad739106 31970878120
store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595
store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596
store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458
store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445
store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705
store_reseek 1163490
store_reseek__03573d0d16474c73a315c5d2a1f25986 169439
store_reseek__0d570a284b274c7686ebbaacc1786110 149943
store_reseek__622d9f19372d478aadb7112dad739106 160193
store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095
store_reseek__a74e26cbc87741689f71f44dca266c85 6957
store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012
store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228
store_reseek__ef169c9106cd4695ba066e98e26a5367 45623
store_reseek_ns 4065247883
store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475
store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515
store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308
store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574
store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452
store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215
store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682
store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662
versions_hint_include_and_seek_next_col 990000000
versions_hint_include_and_seek_next_row 10000000
{noformat}

> Filter SKIP hinting regression
> ------------------------------
>
>                 Key: HBASE-24637
>                 URL: https://issues.apache.org/jira/browse/HBASE-24637
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters, Performance, Scanners
>    Affects Versions: 2.2.5
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to