[
https://issues.apache.org/jira/browse/HBASE-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562731#comment-17562731
]
Nick Dimiduk commented on HBASE-27155:
--------------------------------------
[~apurtell]
I haven't performed a 1-to-1 audit of these metrics vs. the data i'm attaching
to span events on HBASE-27153, but a quick look makes me think that we could
collect all this data via tracing. It may be challenging to extract the span
event data from the span repository -- I haven't tried interacting with the
likes of Jaeger or X-Ray at that API level. That would be the next step for
replicating the above.
Let me take a closer look at the metrics you expose in your patch and see what
I can do with otel.
> Improvements to low level scanner tracing
> -----------------------------------------
>
> Key: HBASE-27155
> URL: https://issues.apache.org/jira/browse/HBASE-27155
> Project: HBase
> Issue Type: Improvement
> Components: Scanners, tracing
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> Related to HBASE-27153, consider tracer semantic attributes for low level
> scanner details.
> Consider
> https://issues.apache.org/jira/secure/attachment/13006571/W-7665966-Instrument-low-level-scan-details-branch-2.2.patch
> (from HBASE-24637). This was used to collect detailed metrics of the
> decisions of ScanQueryMatcher and related classes.
> {noformat}
> metrics: [ "block_read_keys": 477 "block_read_ns": 3427040
> "block_reads": 13 "block_seek_ns": 1606370 "block_seeks": 169
> "block_unpack_ns": 10256 "block_unpacks": 13 "cells_matched": 165
> "cells_matched__hbase:meta,,1.1588230740__info": 165
> "column_hint_include": 148 "memstore_next": 72
> "memstore_next_ns": 136671 "memstore_seek": 2
> "memstore_seek_ns": 631629 "reseeks": 36 "sqm_hint_done": 17
> "sqm_hint_include": 74 "sqm_hint_seek_next_col": 74
> "store_next": 276
> "store_next__1c930a35ff8041368a05817adbdcce97": 40
> "store_next__2644194fdf794815abdc940c183dab88": 40
> "store_next__32ce31753fb244668f788fb94ab02dff": 40
> "store_next__61c8423b9d8846c99a61cd2996b5b621": 116
> "store_next__f4f7878c9fcf40d9902416d5c7a4097a": 40
> "store_next_ns": 1891634
> "store_next_ns__1c930a35ff8041368a05817adbdcce97": 269383
> "store_next_ns__2644194fdf794815abdc940c183dab88": 299936
> "store_next_ns__32ce31753fb244668f788fb94ab02dff": 288594
> "store_next_ns__61c8423b9d8846c99a61cd2996b5b621": 594313
> "store_next_ns__f4f7878c9fcf40d9902416d5c7a4097a": 439408
> "store_reseek": 164
> "store_reseek__1c930a35ff8041368a05817adbdcce97": 32
> "store_reseek__2644194fdf794815abdc940c183dab88": 32
> "store_reseek__32ce31753fb244668f788fb94ab02dff": 32
> "store_reseek__61c8423b9d8846c99a61cd2996b5b621": 36
> "store_reseek__f4f7878c9fcf40d9902416d5c7a4097a": 32
> "store_reseek_ns": 2969978
> "store_reseek_ns__1c930a35ff8041368a05817adbdcce97": 359489
> "store_reseek_ns__2644194fdf794815abdc940c183dab88": 595115
> "store_reseek_ns__32ce31753fb244668f788fb94ab02dff": 474642
> "store_reseek_ns__61c8423b9d8846c99a61cd2996b5b621": 1013188
> "store_reseek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 527544
> "store_seek": 5
> "store_seek__1c930a35ff8041368a05817adbdcce97": 1
> "store_seek__2644194fdf794815abdc940c183dab88": 1
> "store_seek__32ce31753fb244668f788fb94ab02dff": 1
> "store_seek__61c8423b9d8846c99a61cd2996b5b621": 1
> "store_seek__f4f7878c9fcf40d9902416d5c7a4097a": 1
> "store_seek_ns": 8862786
> "store_seek_ns__1c930a35ff8041368a05817adbdcce97": 830421
> "store_seek_ns__2644194fdf794815abdc940c183dab88": 585899
> "store_seek_ns__32ce31753fb244668f788fb94ab02dff": 483605
> "store_seek_ns__61c8423b9d8846c99a61cd2996b5b621": 5958072
> "store_seek_ns__f4f7878c9fcf40d9902416d5c7a4097a": 1004789
> "versions_hint_include": 74 "versions_hint_seek_next_col": 74 ]
> {noformat}
> We can see the differences between seek time and reseek time and we get the
> counts for same, so we can analyze if SQM is making optimal choices (or less
> optimal choices) or not, or if behavior has changed; and we can identify
> particular store file(s) that might be outliers for some reason when hunting
> for sources of regression. We get the time required to unpack blocks (on
> average). We get a count of hints supplied by base SQM functionality or
> filters. We get the relative contributions of query processing time
> separately from memstore and store files.
> Perhaps this can be done conditionally for scans that are selected for
> tracing. Of course there is a performance concern, so it must be done such
> that the overheads really are conditional on if the path is being actively
> traced, and measured carefully to decide if it should be committed or not.
> WDYT [~ndimiduk] [~zhangduo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)