[
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145964#comment-17145964
]
Andrew Kyle Purtell edited comment on HBASE-24637 at 6/26/20, 1:47 AM:
-----------------------------------------------------------------------
If you'd like to play around with the instrumentation, see attached patches,
for branch-1 and branch-2.2, respectively. After applying them add this to
log4j.properties:
{noformat}
log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE
{noformat}
Here's an example output from a scan of {{hbase:meta}}:
{noformat}
2020-06-24 14:45:13,870 TRACE
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120]
ipc.CallRunner: callId: 0 service: ClientService methodName: Scan size: 102
connection: 10.55.111.78:54551 deadline: 1593035713822 successful: true
metrics: [ "block_read_keys": 477 "block_read_ns": 3427040
"block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169
"block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165
"cells_matched__hbase:meta,,1.1588230740__info": 165
"column_hint_include": 148 "memstore_next": 72
"memstore_next_ns": 136671 "memstore_seek": 2
"memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17
"sqm_hint_include": 74 "sqm_hint_seek_next_col": 74
"store_next": 276
"store_next__1c930a35ff8041368a05817adbdcce97": 40
"store_next__2644194fdf794815abdc940c183dab88": 40
"store_next__32ce31753fb244668f788fb94ab02dff": 40
"store_next__61c8423b9d8846c99a61cd2996b5b621": 116
"store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40
"store_next_ns": 1891634
"store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383
"store_next_ns__2644194fdf794815abdc940c183dab88": 299936
"store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594
"store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313
"store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408
"store_reseek": 164
"store_reseek__1c930a35ff8041368a05817adbdcce97": 32
"store_reseek__2644194fdf794815abdc940c183dab88": 32
"store_reseek__32ce31753fb244668f788fb94ab02dff": 32
"store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36
"store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32
"store_reseek_ns": 2969978
"store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489
"store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115
"store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642
"store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188
"store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544
"store_seek": 5
"store_seek__1c930a35ff8041368a05817adbdcce97": 1
"store_seek__2644194fdf794815abdc940c183dab88": 1
"store_seek__32ce31753fb244668f788fb94ab02dff": 1
"store_seek__61c8423b9d8846c99a61cd2996b5b621": 1
"store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1
"store_seek_ns": 8862786
"store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421
"store_seek_ns__2644194fdf794815abdc940c183dab88": 585899
"store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605
"store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072
"store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789
"versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
{noformat}
You can use the attached perl script to aggregate all CallRunner trace logging
for a given table. Here's an example from the PE --filterAll case over the 100
column table on HBase 2.2:
{noformat}
COUNT: 72
block_read_ns 16022572976
block_reads 1157963
block_unpack_ns 40930352
block_unpacks 1157963
cached_block_read_ns 191684215
cached_block_reads 1157330
cells_matched 1010000000
cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0
5459454
cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0
39100332
cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0
101404202
cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0
130169002
cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0
124239494
cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0
139113562
cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0
147105288
cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0
323408666
column_hint_include 1000000000
filter_hint_skip 1000000000
reseeks 1163490
seeker_next 1000000000
seeker_next_ns 92121300195
sqm_hint_done 9999992
sqm_hint_seek_next_col 990000008
sqm_hint_seek_next_row 10000000
store_next 998836518
store_next__03573d0d16474c73a315c5d2a1f25986 145479362
store_next__0d570a284b274c7686ebbaacc1786110 128730258
store_next__622d9f19372d478aadb7112dad739106 137576008
store_next__83be76bc01a74b9e9831c743e25b73b8 319835506
store_next__a74e26cbc87741689f71f44dca266c85 5398444
store_next__d43b019ef3b24abea32261fc5af2b59b 100283189
store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173
store_next__ef169c9106cd4695ba066e98e26a5367 38667578
store_next_ns 227944975778
store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224
store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635
store_next_ns__622d9f19372d478aadb7112dad739106 31970878120
store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595
store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596
store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458
store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445
store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705
store_reseek 1163490
store_reseek__03573d0d16474c73a315c5d2a1f25986 169439
store_reseek__0d570a284b274c7686ebbaacc1786110 149943
store_reseek__622d9f19372d478aadb7112dad739106 160193
store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095
store_reseek__a74e26cbc87741689f71f44dca266c85 6957
store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012
store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228
store_reseek__ef169c9106cd4695ba066e98e26a5367 45623
store_reseek_ns 4065247883
store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475
store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515
store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308
store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574
store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452
store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215
store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682
store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662
versions_hint_include_and_seek_next_col 990000000
versions_hint_include_and_seek_next_row 10000000
{noformat}
was (Author: apurtell):
If you'd like to play around with the instrumentation, see attached patches,
for branch-1 and branch-2.2, respectively. After applying them add this to
log4j.properties:
{noformat}
log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE
{noformat}
Here's an example output from a scan of {{hbase:meta}}:
{noformat}
2020-06-24 14:45:13,870 TRACE
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] ipc.CallRunner:
callId: 0 service: ClientService methodName: Scan size: 102 connection:
10.55.111.78:54551 deadline: 1593035713822 successful: true metrics: [
"block_read_keys": 477 "block_read_ns": 3427040 "block_reads": 13
"block_seek_ns": 1606370 "block_seeks": 169 "block_unpack_ns": 10256
"block_unpacks": 13 "cells_matched": 165
"cells_matched__hbase:meta,,1.1588230740__info": 165 "column_hint_include": 148
"memstore_next": 72 "memstore_next_ns": 136671 "memstore_seek": 2
"memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17
"sqm_hint_include": 74 "sqm_hint_seek_next_col": 74 "store_next": 276
"store_next__1c930a35ff8041368a05817adbdcce97": 40
"store_next__2644194fdf794815abdc940c183dab88": 40
"store_next__32ce31753fb244668f788fb94ab02dff": 40
"store_next__61c8423b9d8846c99a61cd2996b5b621": 116
"store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40 "store_next_ns": 1891634
"store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383
"store_next_ns__2644194fdf794815abdc940c183dab88": 299936
"store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594
"store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313
"store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408 "store_reseek": 164
"store_reseek__1c930a35ff8041368a05817adbdcce97": 32
"store_reseek__2644194fdf794815abdc940c183dab88": 32
"store_reseek__32ce31753fb244668f788fb94ab02dff": 32
"store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36
"store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32 "store_reseek_ns": 2969978
"store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489
"store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115
"store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642
"store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188
"store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544 "store_seek": 5
"store_seek__1c930a35ff8041368a05817adbdcce97": 1
"store_seek__2644194fdf794815abdc940c183dab88": 1
"store_seek__32ce31753fb244668f788fb94ab02dff": 1
"store_seek__61c8423b9d8846c99a61cd2996b5b621": 1
"store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1 "store_seek_ns": 8862786
"store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421
"store_seek_ns__2644194fdf794815abdc940c183dab88": 585899
"store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605
"store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072
"store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789
"versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
{noformat}
You can use the attached perl script to aggregate all CallRunner trace logging
for a given table. Here's an example from the PE --filterAll case over the 100
column table on HBase 2.2:
{noformat}
COUNT: 72
block_read_ns 16022572976
block_reads 1157963
block_unpack_ns 40930352
block_unpacks 1157963
cached_block_read_ns 191684215
cached_block_reads 1157330
cells_matched 1010000000
cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0
5459454
cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0
39100332
cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0
101404202
cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0
130169002
cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0
124239494
cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0
139113562
cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0
147105288
cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0
323408666
column_hint_include 1000000000
filter_hint_skip 1000000000
reseeks 1163490
seeker_next 1000000000
seeker_next_ns 92121300195
sqm_hint_done 9999992
sqm_hint_seek_next_col 990000008
sqm_hint_seek_next_row 10000000
store_next 998836518
store_next__03573d0d16474c73a315c5d2a1f25986 145479362
store_next__0d570a284b274c7686ebbaacc1786110 128730258
store_next__622d9f19372d478aadb7112dad739106 137576008
store_next__83be76bc01a74b9e9831c743e25b73b8 319835506
store_next__a74e26cbc87741689f71f44dca266c85 5398444
store_next__d43b019ef3b24abea32261fc5af2b59b 100283189
store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173
store_next__ef169c9106cd4695ba066e98e26a5367 38667578
store_next_ns 227944975778
store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224
store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635
store_next_ns__622d9f19372d478aadb7112dad739106 31970878120
store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595
store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596
store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458
store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445
store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705
store_reseek 1163490
store_reseek__03573d0d16474c73a315c5d2a1f25986 169439
store_reseek__0d570a284b274c7686ebbaacc1786110 149943
store_reseek__622d9f19372d478aadb7112dad739106 160193
store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095
store_reseek__a74e26cbc87741689f71f44dca266c85 6957
store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012
store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228
store_reseek__ef169c9106cd4695ba066e98e26a5367 45623
store_reseek_ns 4065247883
store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475
store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515
store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308
store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574
store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452
store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215
store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682
store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662
versions_hint_include_and_seek_next_col 990000000
versions_hint_include_and_seek_next_row 10000000
{noformat}
> Filter SKIP hinting regression
> ------------------------------
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
> Issue Type: Bug
> Components: Filters, Performance, Scanners
> Affects Versions: 2.2.5
> Reporter: Andrew Kyle Purtell
> Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf,
> W-7665966-Instrument-low-level-scan-details-branch-1.patch,
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch,
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate
> significantly better microbenchmarks in a number of cases, and usually shows
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call
> metrics that leverage the fact it puts a reference to the current Call into a
> thread local and that all activity for a given RPC is processed by a single
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock,
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6
> and 2.2 versions under test operated on identical data files in HDFS. For
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached.
> It appears a refactor to ScanQueryMatcher and friends has disabled the
> ability of filters to provide meaningful SKIP hints, which disables an
> optimization that avoids reseeking, leading to a serious and proportional
> regression in reseek activity and time spent in that code path. So for
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was
> almost identical, as measured by counts of the hint types returned, whether
> or not column or version trackers are called, and counts of store seeks or
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and
> results generally fell within this range, except for the filter all case of
> course.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)