[ https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145964#comment-17145964 ]
Andrew Kyle Purtell edited comment on HBASE-24637 at 6/26/20, 1:47 AM: ----------------------------------------------------------------------- If you'd like to play around with the instrumentation, see attached patches, for branch-1 and branch-2.2, respectively. After applying them add this to log4j.properties: {noformat} log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE {noformat} Here's an example output from a scan of {{hbase:meta}}: {noformat} 2020-06-24 14:45:13,870 TRACE [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] ipc.CallRunner: callId: 0 service: ClientService methodName: Scan size: 102 connection: 10.55.111.78:54551 deadline: 1593035713822 successful: true metrics: [ "block_read_keys": 477 "block_read_ns": 3427040 "block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169 "block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165 "cells_matched__hbase:meta,,1.1588230740__info": 165 "column_hint_include": 148 "memstore_next": 72 "memstore_next_ns": 136671 "memstore_seek": 2 "memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17 "sqm_hint_include": 74 "sqm_hint_seek_next_col": 74 "store_next": 276 "store_next__1c930a35ff8041368a05817adbdcce97": 40 "store_next__2644194fdf794815abdc940c183dab88": 40 "store_next__32ce31753fb244668f788fb94ab02dff": 40 "store_next__61c8423b9d8846c99a61cd2996b5b621": 116 "store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40 "store_next_ns": 1891634 "store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383 "store_next_ns__2644194fdf794815abdc940c183dab88": 299936 "store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594 "store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313 "store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408 "store_reseek": 164 "store_reseek__1c930a35ff8041368a05817adbdcce97": 32 "store_reseek__2644194fdf794815abdc940c183dab88": 32 "store_reseek__32ce31753fb244668f788fb94ab02dff": 32 "store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36 "store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32 "store_reseek_ns": 2969978 "store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489 "store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115 "store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642 "store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188 "store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544 "store_seek": 5 "store_seek__1c930a35ff8041368a05817adbdcce97": 1 "store_seek__2644194fdf794815abdc940c183dab88": 1 "store_seek__32ce31753fb244668f788fb94ab02dff": 1 "store_seek__61c8423b9d8846c99a61cd2996b5b621": 1 "store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1 "store_seek_ns": 8862786 "store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421 "store_seek_ns__2644194fdf794815abdc940c183dab88": 585899 "store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605 "store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072 "store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789 "versions_hint_include": 74 "versions_hint_seek_next_col": 74 ] {noformat} You can use the attached perl script to aggregate all CallRunner trace logging for a given table. Here's an example from the PE --filterAll case over the 100 column table on HBase 2.2: {noformat} COUNT: 72 block_read_ns 16022572976 block_reads 1157963 block_unpack_ns 40930352 block_unpacks 1157963 cached_block_read_ns 191684215 cached_block_reads 1157330 cells_matched 1010000000 cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0 5459454 cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0 39100332 cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0 101404202 cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0 130169002 cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0 124239494 cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0 139113562 cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0 147105288 cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0 323408666 column_hint_include 1000000000 filter_hint_skip 1000000000 reseeks 1163490 seeker_next 1000000000 seeker_next_ns 92121300195 sqm_hint_done 9999992 sqm_hint_seek_next_col 990000008 sqm_hint_seek_next_row 10000000 store_next 998836518 store_next__03573d0d16474c73a315c5d2a1f25986 145479362 store_next__0d570a284b274c7686ebbaacc1786110 128730258 store_next__622d9f19372d478aadb7112dad739106 137576008 store_next__83be76bc01a74b9e9831c743e25b73b8 319835506 store_next__a74e26cbc87741689f71f44dca266c85 5398444 store_next__d43b019ef3b24abea32261fc5af2b59b 100283189 store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173 store_next__ef169c9106cd4695ba066e98e26a5367 38667578 store_next_ns 227944975778 store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224 store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635 store_next_ns__622d9f19372d478aadb7112dad739106 31970878120 store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595 store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596 store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458 store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445 store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705 store_reseek 1163490 store_reseek__03573d0d16474c73a315c5d2a1f25986 169439 store_reseek__0d570a284b274c7686ebbaacc1786110 149943 store_reseek__622d9f19372d478aadb7112dad739106 160193 store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095 store_reseek__a74e26cbc87741689f71f44dca266c85 6957 store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012 store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228 store_reseek__ef169c9106cd4695ba066e98e26a5367 45623 store_reseek_ns 4065247883 store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475 store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515 store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308 store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574 store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452 store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215 store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682 store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662 versions_hint_include_and_seek_next_col 990000000 versions_hint_include_and_seek_next_row 10000000 {noformat} was (Author: apurtell): If you'd like to play around with the instrumentation, see attached patches, for branch-1 and branch-2.2, respectively. After applying them add this to log4j.properties: {noformat} log4j.logger.org.apache.hadoop.hbase.ipc.CallRunner=TRACE {noformat} Here's an example output from a scan of {{hbase:meta}}: {noformat} 2020-06-24 14:45:13,870 TRACE [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=8120] ipc.CallRunner: callId: 0 service: ClientService methodName: Scan size: 102 connection: 10.55.111.78:54551 deadline: 1593035713822 successful: true metrics: [ "block_read_keys": 477 "block_read_ns": 3427040 "block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169 "block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165 "cells_matched__hbase:meta,,1.1588230740__info": 165 "column_hint_include": 148 "memstore_next": 72 "memstore_next_ns": 136671 "memstore_seek": 2 "memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17 "sqm_hint_include": 74 "sqm_hint_seek_next_col": 74 "store_next": 276 "store_next__1c930a35ff8041368a05817adbdcce97": 40 "store_next__2644194fdf794815abdc940c183dab88": 40 "store_next__32ce31753fb244668f788fb94ab02dff": 40 "store_next__61c8423b9d8846c99a61cd2996b5b621": 116 "store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40 "store_next_ns": 1891634 "store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383 "store_next_ns__2644194fdf794815abdc940c183dab88": 299936 "store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594 "store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313 "store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408 "store_reseek": 164 "store_reseek__1c930a35ff8041368a05817adbdcce97": 32 "store_reseek__2644194fdf794815abdc940c183dab88": 32 "store_reseek__32ce31753fb244668f788fb94ab02dff": 32 "store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36 "store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32 "store_reseek_ns": 2969978 "store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489 "store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115 "store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642 "store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188 "store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544 "store_seek": 5 "store_seek__1c930a35ff8041368a05817adbdcce97": 1 "store_seek__2644194fdf794815abdc940c183dab88": 1 "store_seek__32ce31753fb244668f788fb94ab02dff": 1 "store_seek__61c8423b9d8846c99a61cd2996b5b621": 1 "store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1 "store_seek_ns": 8862786 "store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421 "store_seek_ns__2644194fdf794815abdc940c183dab88": 585899 "store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605 "store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072 "store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789 "versions_hint_include": 74 "versions_hint_seek_next_col": 74 ] {noformat} You can use the attached perl script to aggregate all CallRunner trace logging for a given table. Here's an example from the PE --filterAll case over the 100 column table on HBase 2.2: {noformat} COUNT: 72 block_read_ns 16022572976 block_reads 1157963 block_unpack_ns 40930352 block_unpacks 1157963 cached_block_read_ns 191684215 cached_block_reads 1157330 cells_matched 1010000000 cells_matched__TestTable_f1_c100,,1592866205366.0315e8ccd0024d0460970325194853e1.__info0 5459454 cells_matched__TestTable_f1_c100,00000000000000000000054054,1592866610801.d6f9b09463fc61ff59fcb3daed655f94.__info0 39100332 cells_matched__TestTable_f1_c100,00000000000000000000441186,1592867432331.17af74d64f57ef963aadacbf0ae617db.__info0 101404202 cells_matched__TestTable_f1_c100,00000000000000000001445188,1592867920908.92ef6284a81f29d96dd4cb764f6590f3.__info0 130169002 cells_matched__TestTable_f1_c100,00000000000000000002733990,1592868498045.7541760abc640c3714dabe8d78fd7a4f.__info0 124239494 cells_matched__TestTable_f1_c100,00000000000000000003964084,1592869106036.da6e844c39f943a6d3ed41addf728972.__info0 139113562 cells_matched__TestTable_f1_c100,00000000000000000005341446,1592869725318.39a71397e112f7edb7e9bff443ac8913.__info0 147105288 cells_matched__TestTable_f1_c100,00000000000000000006797934,1592869725318.8e4d45733bc5e46e8b345d7dd91bd2f0.__info0 323408666 column_hint_include 1000000000 filter_hint_skip 1000000000 reseeks 1163490 seeker_next 1000000000 seeker_next_ns 92121300195 sqm_hint_done 9999992 sqm_hint_seek_next_col 990000008 sqm_hint_seek_next_row 10000000 store_next 998836518 store_next__03573d0d16474c73a315c5d2a1f25986 145479362 store_next__0d570a284b274c7686ebbaacc1786110 128730258 store_next__622d9f19372d478aadb7112dad739106 137576008 store_next__83be76bc01a74b9e9831c743e25b73b8 319835506 store_next__a74e26cbc87741689f71f44dca266c85 5398444 store_next__d43b019ef3b24abea32261fc5af2b59b 100283189 store_next__e4bf47ff9f4f4642818fb4a48641b0c2 122866173 store_next__ef169c9106cd4695ba066e98e26a5367 38667578 store_next_ns 227944975778 store_next_ns__03573d0d16474c73a315c5d2a1f25986 32928142224 store_next_ns__0d570a284b274c7686ebbaacc1786110 29647340635 store_next_ns__622d9f19372d478aadb7112dad739106 31970878120 store_next_ns__83be76bc01a74b9e9831c743e25b73b8 71921842595 store_next_ns__a74e26cbc87741689f71f44dca266c85 1348713596 store_next_ns__d43b019ef3b24abea32261fc5af2b59b 22877802458 store_next_ns__e4bf47ff9f4f4642818fb4a48641b0c2 28531329445 store_next_ns__ef169c9106cd4695ba066e98e26a5367 8718926705 store_reseek 1163490 store_reseek__03573d0d16474c73a315c5d2a1f25986 169439 store_reseek__0d570a284b274c7686ebbaacc1786110 149943 store_reseek__622d9f19372d478aadb7112dad739106 160193 store_reseek__83be76bc01a74b9e9831c743e25b73b8 371095 store_reseek__a74e26cbc87741689f71f44dca266c85 6957 store_reseek__d43b019ef3b24abea32261fc5af2b59b 117012 store_reseek__e4bf47ff9f4f4642818fb4a48641b0c2 143228 store_reseek__ef169c9106cd4695ba066e98e26a5367 45623 store_reseek_ns 4065247883 store_reseek_ns__03573d0d16474c73a315c5d2a1f25986 579002475 store_reseek_ns__0d570a284b274c7686ebbaacc1786110 522858515 store_reseek_ns__622d9f19372d478aadb7112dad739106 568928308 store_reseek_ns__83be76bc01a74b9e9831c743e25b73b8 1279376574 store_reseek_ns__a74e26cbc87741689f71f44dca266c85 60820452 store_reseek_ns__d43b019ef3b24abea32261fc5af2b59b 403998215 store_reseek_ns__e4bf47ff9f4f4642818fb4a48641b0c2 494403682 store_reseek_ns__ef169c9106cd4695ba066e98e26a5367 155859662 versions_hint_include_and_seek_next_col 990000000 versions_hint_include_and_seek_next_row 10000000 {noformat} > Filter SKIP hinting regression > ------------------------------ > > Key: HBASE-24637 > URL: https://issues.apache.org/jira/browse/HBASE-24637 > Project: HBase > Issue Type: Bug > Components: Filters, Performance, Scanners > Affects Versions: 2.2.5 > Reporter: Andrew Kyle Purtell > Priority: Major > Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, > W-7665966-Instrument-low-level-scan-details-branch-1.patch, > W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, > parse_call_trace.pl > > > I have been looking into reported performance regressions in HBase 2 relative > to HBase 1. Depending on the test scenario, HBase 2 can demonstrate > significantly better microbenchmarks in a number of cases, and usually shows > improvement in whole cluster benchmarks like YCSB. > To assist in debugging I added methods to RpcServer for updating per-call > metrics that leverage the fact it puts a reference to the current Call into a > thread local and that all activity for a given RPC is processed by a single > thread context. I then instrumented ScanQueryMatcher (in branch-1) and its > various friends (in branch-2.2), StoreScanner, HFileReaderV2 and > HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, > and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables > with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per > row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 > and 2.2 versions under test operated on identical data files in HDFS. For > tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to > ensure only the server side differed. > The results for pe --filterAll were revealing. See attached. > It appears a refactor to ScanQueryMatcher and friends has disabled the > ability of filters to provide meaningful SKIP hints, which disables an > optimization that avoids reseeking, leading to a serious and proportional > regression in reseek activity and time spent in that code path. So for > queries that use filters, there can be a substantial regression. > Other test cases that did not use filters did not show this regression. If > filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was > almost identical, as measured by counts of the hint types returned, whether > or not column or version trackers are called, and counts of store seeks or > reseeks. Regarding micro-timings, there was a 10% variance in my testing and > results generally fell within this range, except for the filter all case of > course. -- This message was sent by Atlassian Jira (v8.3.4#803005)