(couchdb) branch couch-stats-resource-tracker-v3-rebase-http updated: WIP: CSRT.md and some cleanup

chewbranca Thu, 03 Jul 2025 16:08:09 -0700

This is an automated email from the ASF dual-hosted git repository.

chewbranca pushed a commit to branch couch-stats-resource-tracker-v3-rebase-http
in repository https://gitbox.apache.org/repos/asf/couchdb.git



The following commit(s) were added to 
refs/heads/couch-stats-resource-tracker-v3-rebase-http by this push:
     new a5293b258 WIP: CSRT.md and some cleanup
a5293b258 is described below

commit a5293b25818e5d25e087d170802621fd128df24e
Author: Russell Branca <[email protected]>
AuthorDate: Thu Jul 3 16:07:51 2025 -0700

    WIP: CSRT.md and some cleanup
---
 rel/overlay/etc/default.ini                        |   6 +-
 src/couch_stats/CSRT.md                            | 844 +++++++++++++++++++++
 .../src/couch_stats_resource_tracker.hrl           |   2 +-
 src/couch_stats/src/csrt_logger.erl                |   6 +-
 src/couch_stats/src/csrt_util.erl                  |   2 +
 5 files changed, 855 insertions(+), 5 deletions(-)

diff --git a/rel/overlay/etc/default.ini b/rel/overlay/etc/default.ini
index d9730a42f..b3ea606f9 100644
--- a/rel/overlay/etc/default.ini
+++ b/rel/overlay/etc/default.ini
@@ -1140,12 +1140,12 @@ enable_reporting = true
 [csrt.init_p]
 fabric_rpc__all_docs = true
 fabric_rpc__changes = true
-fabric_rpc__map_view = true
-fabric_rpc__reduce_view = true
 fabric_rpc__get_all_security = true
+fabric_rpc__map_view = true
 fabric_rpc__open_doc = true
-fabric_rpc__update_docs = true
 fabric_rpc__open_shard = true
+fabric_rpc__reduce_view = true
+fabric_rpc__update_docs = true
 
 ; CSRT dbname matchers
 ; Given a dbname and a positive integer, this will enable an IO matcher
diff --git a/src/couch_stats/CSRT.md b/src/couch_stats/CSRT.md
index 073e89e09..420a32b95 100644
--- a/src/couch_stats/CSRT.md
+++ b/src/couch_stats/CSRT.md
@@ -1 +1,845 @@
 # Couch Stats Resource Tracker (CSRT)
+
+TODO: add list of stat fields
+
+# CSRT Config Keys
+
+## -define(CSRT, "csrt").
+
+> config:get("csrt").
+
+Primary CSRT config namespace: contains core settings for enabling different
+layers of functionality in CSRT, along with global config settings for limiting
+data volume generation.
+
+## -define(CSRT_INIT_P, "csrt.init_p").
+
+> config:get("csrt.init_p").
+
+Config toggles for tracking counters on spawning of RPC `fabric_rpc` workers by
+way of `rexi_server:init_p`. This allows us to conditionally enable new metrics
+for the desired RPC operations in an expandable manner, without having to add
+new stats for every single potential RPC operation. These are for the 
individual
+metrics to track, the feature is enabled by way of the config toggle
+`config:get(?CSRT, "enable_init_p")`, and these configs can left alone for the
+most part until new operations are tracked.
+
+## -define(CONF_MATCHERS_ENABLED, "csrt_logger.matchers_enabled").
+
+> config:get("csrt_logger.matchers_enabled").
+
+Config toggles for enabling specific builtin logger matchers, see the dedicated
+section below on `# CSRT Default Matchers`.
+
+## -define(CONF_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").
+
+> config:get("csrt_logger.matchers_threshold").
+
+Config settings for defining the primary `Threshold` value of the builtin 
logger
+matchers, see the dedicated section below on `# CSRT Default Matchers`.
+
+## -define(CONF_MATCHERS_DBNAMES, "csrt_logger.dbnames_io").
+
+> config:get("csrt_logger.matchers_enabled").
+
+Config section for setting `$db_name = $threshold` resulting in instantiating a
+"dbname_io" logger matcher for each `$db_name` that will generate a CSRT
+lifecycle report for any contexts that that induced more operations on _any_ 
one
+field of `ioq_calls|get_kv_node|get_kp_node|docs_read|rows_read` that is 
greater
+than `$threshold` and is on database `$db_name`.
+
+This is basically a simple matcher for finding heavy IO requests on a 
particular
+database, in a manner amenable to key/value pair specifications in this .ini
+file until a more sophisticated declarative model exists. In particular, it's
+not easy to sequentially generate matchspecs by way `ets:fun2ms/1`, and so an
+alternative mechanism for either dynamically assembling an `#rctx{}` to match
+against or generating the raw matchspecs themselves is warranted.
+
+
+# CSRT Code Markers
+
+## -define(CSRT_ETS, csrt_server).
+
+This is the reference to the CSRT ets table, it's managed by `csrt_server` so
+that's where the name originates from.
+
+## -define(MATCHERS_KEY, {csrt_logger, all_csrt_matchers}).
+
+This marker is where the active matchers are written to in `persisten_term` for
+concurrently and parallelly and accessing the logger matchers in the CSRT
+tracker processes for lifecycle reporting.
+
+# CSRT Process Dictionary Markers
+
+## -define(DELTA_TA, {csrt, delta_ta}).
+
+This stores our last delta snapshot to track progress since the last 
incremental
+streaming of stats back to the coordinator process. This will be updated after
+the next delta is made with the latest value. Eg this stores `T0` so we can do
+`T1 = get_resource()` `make_delta(T0, T1)` and then we save `T1` as the new 
`T0`
+for use in our next delta.
+
+## -define(LAST_UPDATED, {csrt, last_updated}).
+
+This stores the integer corresponding to the `erlang:monotonic_time()` value of
+the most recent `updated_at` value. Basically this lets us utilize a pdict
+value to be able to turn `update_at` tracking into an incremental operation 
that
+can be chained in the existing atomic `ets:update_counter` and
+`ets:update_element` calls.
+
+The issue being that our updates are of the form `+2 to ioq_calls for 
$pid_ref`,
+which ets does atomically in a guaranteed `atomic` and `isolated` manner. The
+strict use of the atomic operations for tracking these values is why this
+system works effeciently at scale. This means that we can increment counters on
+all of the stats counter fields in a batch, very quickly, but for tracking
+`updated_at` timestamps we'd need to either do an extra ets call to get the 
last
+`updated_at` value, or do an extra ets call to `ets:update_element` to set the
+`updated_at` value to `csrt_util:tnow()`. The core problem with this is that 
the
+batch inc operation is essentially the only write operation performed after the
+initial context setting of dbname/handler/etc; this means that we'd literally
+double the number of ets calls induced to track CSRT updates, just for tracking
+the `updated_at`. So instead, we rely on the fact that the local process
+corresponding to `$pid_ref` is the _only_ process doing updates so we know the
+last `updated_at` value will be the last time this process updated the data. So
+we track that value in the pdict and then take a delta between `tnow()` and
+`updated_at`, and then `updated_at` becomes a value we can sneak into the other
+integer ounter updates we're already performing!
+
+## -define(PID_REF, {csrt, pid_ref}).
+
+This marker is for the core storing the core `PidRef` identifier. The key idea
+here is that a lifecycle is a context lifecycle is contained to within the 
given
+`PidRef`, meaning that a `Pid` can instantiate different CSRT lifecycles and
+pass those to different workers.
+
+This is specifically necessary for long running processes that need to handle
+many CSRT context lifecycles over the course of that individual process's
+lifecycle independent. In practice, this is immediately needed for the actual
+coordinator lifecycle tracking, as `chttpd` uses a worker pool of http request
+handlers that can be re-used, so we need a way to create a CSRT lifecycle
+corresponding to the given request currently being serviced. This is also
+intended to be used in other long running processes, like IOQ or `couch_js` 
pids
+such that we can track the specific context inducing the operations on the
+`couch_file` pid or indexer or replicator or whatever.
+
+Worker processes have a more clear cut lifecycle, but either style of process
+can be exit'ed in a manner that skips the ability to do cleanup operations, so
+additionally there's a dedicated tracker process spawned to monitor the process
+that induced the CSRT context such that we can do the dynamic logger matching
+directly in these tracker processes and also we can properly cleanup the ets
+entries even if the Pid crashes.
+
+## -define(TRACKER_PID, {csrt, tracker}).
+
+A handle to the spawned tracker process that does cleanup and logger matching
+reprots at the end of the process lifecycle. We store a reference to the 
tracker
+pid so that for explicit context destruction, like in `chttpd` workers after a
+request has been serviced, we can update stop the tracker and perform the
+expected cleanup directly.
+
+# #rctx{} fields
+
+# rexi_server:init_p -- rename/clarify this
+
+# Primary Config Toggles
+
+# CSRT (?CSRT="csrt") Config Settings
+
+## config:get(?CSRT, "enable", false).
+
+Core enablement toggle for CSRT, defaults to false. Enabling this setting
+intiates local CSRT stats collection as well as shipping deltas in RPC
+responses to accumulate in the coordinator.
+
+This does _not_ trigger the new RPC spawn metrics, and it does not enable
+reporting for any of the rctx types.
+
+*NOTE*: you *MUST* have all nodes in the cluster running a CSRT aware CouchDB
+_before_ you enable it on any node, otherwise the old version nodes won't know
+how to handle the new RPC formats including an embedded Delta payload.
+
+## config:get(?CSRT, "enable_init_p", false).
+
+Enablement of tracking new metric counters for different `fabric_rpc` 
operations
+types to track spawn rates of RPC work induced across the cluster. There is
+corresponding config lookups into the `?CSRT_INIT_P` namespace for keys of the
+form: `atom_to_list(Mod) ++ "__" atom_to_list(Fun)`, eg 
`"fabric_rpc__open_doc"`
+for enabling the specific RPC endpoints.
+
+However, those individual settings can be ignored and this top level config
+toggle is what should be used in general, as the function specific config
+toggles predominantly exist to enable tracking a subet of total RPC operations
+in the cluster, and new endpoints can be added here.
+
+## config:get(?CSRT, "enable_reporting", false).
+
+This is the primary toggle for enabling CSRT process lifetime reports 
containing
+detailed information about the quantity of work induced by the given
+request/worker/etc. This is the top level toggle for enabling _any_ reporting,
+and there also exists `config:get(?CSRT, "enable_rpc_reporting", false).` to
+disable the reporting of any individual RPC workers, leaving the coordinator
+responsible of generating a report with the accumulated deltas.
+
+## config:get(?CSRT, "enable_rpc_reporting", false).
+
+This enables the possibility of RPC workers generating reports. They still need
+to hit the configured thresholds to induce a report, but this will generate 
CSRT
+process lifetime reports for individual RPC workers that trigger the configured
+logger thresholds. This allows for quantifying per node resource usage when
+desired, as otherwise the reports are at the http request level and don't
+provide per node stats.
+
+The key idea here is that having RPC level CSRT process lifetime reporting is
+incredibly useful, but can also generate large quantities of data. For example,
+a view query on a Q=64 database will stream results from 64 shard replicas,
+resulting in at least 64 RPC reports, plus any that might have been generated
+from RPC workers that "lost" the race for shard replica. This is very useful,
+but a lot of data given the verbose nature of funneling it through the RSyslog
+reports, however, the ability to write directly to something like ClickHouse or
+another columnar store would be great.
+
+Until there's an efficient storage mechanism to stream the results to, the
+rsyslog entries work great and are very practical, but care must be taken to
+not generate too much data for aggregate queries as they generate at least `Qx`
+more report than an individual report per http request from the coordinator.
+This setting exists as a way to either a) utilize the logger matcher configured
+thresholds to allow for _any_ rctx's to be recorded when they induce heavy
+operations, either Coordinator or RPC worker; or b) to _only_ log workloads at
+the coordinator level.
+
+NOTE: this setting exists because we lack an expressive enough config
+declaration to easily chain the matchspec constructions as `ets:fun2ms/1` is a
+special compile time parse transform macro that requires the fully definition 
to
+be specified directly, it cannot be iteractively constructed. That said, you
+_can_ register matchers through remsh with more specific and fine grained
+pattern matching, and a more expressive system for defining matchers are being
+explored.
+
+## config:get_boolean(?CSRT, "should_truncate_reports", true)
+
+Enables truncation of the CSRT process lifetime reports to not include any
+fields that are zero at the end of process lifetime, eg don't include
+`js_filter=0` in the report if the request did not induce Javascript filtering.
+
+This can be disabled if you really care about consistent fields in the report
+logs, but this is a log space saving mechanism, similar to disabling RPC
+reporting by default, as its a simple way to reduce overall volume
+
+## config:get(?CSRT, "randomize_testing", true).
+
+This is a `make eunit` only feature toggle that will induce randomness into the
+cluster's `csrt:is_enabled()` state, specifically to utilize the test suite to
+exercise edge case scenarios and failures when CSRT is only conditionally
+enabled, ensuring that it gracefuly and robustly handles errors without fallout
+to the underlying http clients.
+
+The idea here is to introduce randomness into whether CSRT is enabled across 
all
+the nodes to simulate clusters with heterogeneous CSRT enablement and also to
+ensure that CSRT works properly when toggled on/off wihout causing any
+unexpected fallout to the client requests.
+
+This is a config toggle specifically so that the actual CSRT tests can disable
+it for making accurate assertions about resource usage traacking, and is not
+intended to be used directly.
+
+## config:get_integer(?CSRT, "query_limit", ?QUERY_LIMIT)
+
+TODO: fill in with http updates
+
+# CSRT_INIT_P (?CSRT_INIT_P="csrt.init_p") Config Settings
+
+## config:get(?CSRT_INIT_P, ModFunName, false).
+
+These config toggles exist to conditionaly enable additional tracking of RPC
+endpoints of interest, but rather it's a way to selectively enable tracking for
+a subset of RPC operations, in a way we can extend later to add more. The
+`ModFunName` is of the form `atom_to_list(Mod) ++ "__" atom_to_list(Fun)`, eg
+`"fabric_rpc__open_doc"`, and right now, only exists for `fabric_rpc` modules.
+
+NOTE: this is a bit awkward and isn't meant to be used directly, instead,
+utilize `config:set(?CSRT, "enable_init_p", "true").` to enable or disable 
these
+as a whole.
+
+The current set of operations, as copied in from `default.ini`
+
+```
+[csrt.init_p]
+fabric_rpc__all_docs = true
+fabric_rpc__changes = true
+fabric_rpc__get_all_security = true
+fabric_rpc__map_view = true
+fabric_rpc__open_doc = true
+fabric_rpc__open_shard = true
+fabric_rpc__reduce_view = true
+fabric_rpc__update_docs = true
+```
+
+# CSRT Logger Matcher Enablement and Thresholds
+
+There are currently six builtin default loggers designed to make it easy to do
+filtering on heavy resource usage inducing and long running requests. These are
+designed as a simple baseline of useful matchers, declared in a manner amenable
+to `default.ini` based constructs. More expressive matcher declarations are
+being explored, and matchers of arbitrary complexity can be registered directly
+through remsh. The default matchers are all designed around an integer config
+Threshold that triggers on a specific field, eg docs read, or on a delta of
+fields for long requests and changes requests that process many rows but return
+few.
+
+The current default matchers are:
+
+  * docs_read: match all requests reading more than N docs
+  * rows_read: match all requests reading more than N rows
+  * docs_written: match all requests writing more than N docs
+  * long_reqs: match all requests lasting more than N milliseconds
+  * changes_processed: match all changes requests that returned at least N rows
+    less than was necessarily loaded to complete the request (eg find heavy
+    filtered changes requests reading many rows but returning few).
+  * ioq_calls: match all requests inducing more than N ioq_calls
+
+Each of the default matchers has an enablement setting in
+`config:get(?CONF_MATCHERS_ENABLED, Name)` for toggling enablement of it, and a
+corresponding threshold value setting in `config:get(?CONF_MATCHERS_THRESHOLD,
+Name)` that is an integer value corresponding to the specific nature of that
+matcher.
+
+
+## CSRT Logger Matcher Enablement (?CONF_MATCHERS_ENABLED)
+
+> -define(CONF_MATCHERS_THRESHOLD, "csrt_logger.matchers_enabled").
+
+
+### config:get_boolean(?CONF_MATCHERS_ENABLED, "docs_read", false)
+
+Enable the `docs_read` builtin matcher, with a default `Threshold=1000`, such
+that any request that reads more than `Threshold` docs will generate a CSRT
+process lifetime report with a summary of its resouce consumption.
+
+This is different from the `rows_read` filter in that a view with `?limit=1000`
+will read 1000 rows, but the same request with `?include_docs=true` will also
+induce an additional 1000 docs read.
+
+### config:get_boolean(?CONF_MATCHERS_ENABLED, "rows_read", false)
+
+Enable the `rows_read` builtin matcher, with a default `Threshold=1000`, such
+that any request that reads more than `Threshold` rows will generate a CSRT
+process lifetime report with a summary of its resouce consumption.
+
+This is different from the `docs_read` filter so that we can distinguish 
between
+heavy view requests with lots of rows or heavy requests with lots of docs.
+
+### config:get_boolean(?CONF_MATCHERS_ENABLED, "docs_written", false)
+
+Enable the `docs_written` builtin matcher, with a default `Threshold=500`, such
+that any request that writtens more than `Threshold` docs will generate a CSRT
+process lifetime report with a summary of its resouce consumption.
+
+### config:get_boolean(?CONF_MATCHERS_ENABLED, "ioq_calls", false)
+
+Enable the `ioq_calls` builtin matcher, with a default `Threshold=10000`, such
+that any request that induces more than `Threshold` IOQ calls will generate a
+CSRT process lifetime report with a summary of its resouce consumption.
+
+### config:get_boolean(?CONF_MATCHERS_ENABLED, "long_reqs", false)
+
+Enable the `long_reqs` builtin matcher, with a default `Threshold=60000`, such
+that any request where the the last CSRT rctx `updated_at` timestamp is at 
least
+`Threshold` milliseconds grather than the `started_at timestamp` will generate 
a
+CSRT process lifetime report with a summary of its resource consumption.
+
+
+## CSRT Logger Matcher Threshold (?CONF_MATCHERS_THRESHOLD)
+
+> -define(CONF_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").
+
+
+### config:get_integer(?CONF_MATCHERS_THRESHOLD, "docs_read", 1000)
+
+Threshold for `docs_read` logger matcher, defaults to `1000` docs read.
+
+### config:get_integer(?CONF_MATCHERS_THRESHOLD, "rows_read", false)
+
+Threshold for `rows_read` logger matcher, defaults to `1000` rows read.
+
+### config:get_integer(?CONF_MATCHERS_THRESHOLD, "docs_written", false)
+
+Threshold for `docs_written` logger matcher, defaults to `500` docs written.
+
+### config:get_integer(?CONF_MATCHERS_THRESHOLD, "ioq_calls", false)
+
+Threshold for `ioq_calls` logger matcher, defaults to `10_000` IOQ calls made.
+
+### config:get_integer(?CONF_MATCHERS_THRESHOLD, "long_reqs", false)
+
+Threshold for `long_reqs` logger matcher, defaults to `60_000` milliseconds.
+
+
+# Core CSRT API
+
+The `csrt(.erl)` module is the primary entry point into CSRT, containing API
+functionality for tracking the lifecycle of processes, inducing metric tracking
+over that lifecycle, and also a variety of functions for aggregate querying.
+
+It's worth noting that the CSRT context tracking functions are specifically
+designed to not `throw` and be safe in the event of unexpected CSRT failures or
+edge cases. The aggregate query API has some callers that will actually throw,
+but aside from this core CSRT operations will not bubble up exceptions, and 
will
+either return the error value, or catch the error and move on rather than
+chaining further errors.
+
+## PidRef API
+
+These are functions are CRUD operations around creating and storing the CSRT
+`PidRef` handle.
+
+```
+-export([
+    destroy_pid_ref/0,
+    destroy_pid_ref/1,
+    create_pid_ref/0,
+    get_pid_ref/0,
+    get_pid_ref/1,
+    set_pid_ref/1
+]).
+```
+
+## Context Lifecycle API
+
+These are the CRUD functions for handling a CSRT context lifecycle, where a
+lifecycle context is created in a `chttpd` coordinator process by way of
+`csrt:create_coordinator_context/2`, or in `rexi_server:init_p` by way of
+`csrt:create_worker_context/3`. Additional functions are exposed for setting
+context specific info like username/dbname/handler. `get_resource` fetches the
+context being tracked corresponding to the given `PidRef`.
+
+```
+-export([
+    create_context/2,
+    create_coordinator_context/2,
+    create_worker_context/3,
+    destroy_context/0,
+    destroy_context/1,
+    get_resource/0,
+    get_resource/1,
+    set_context_dbname/1,
+    set_context_dbname/2,
+    set_context_handler_fun/1,
+    set_context_handler_fun/2,
+    set_context_username/1,
+    set_context_username/2
+]).
+```
+
+## Public API
+
+The "Public" or miscellaneous API for lack of a better name. These are various
+functions exposed for wider use and/or testing purposes.
+
+```
+-export([
+    clear_pdict_markers/0,
+    do_report/2,
+    is_enabled/0,
+    is_enabled_init_p/0,
+    maybe_report/2,
+    to_json/1
+]).
+```
+
+## Stats Collection API
+
+This is the stats collection API utilized by way of
+`couch_stats:increment_counter` to do local process tracking, and also in 
`rexi`
+to adding and extracting delta contexts and then accumulating those values.
+
+NOTE: `make_delta/0` is a "destructive" operation that will induce a new delta
+by way of the last local pdict's rctx delta snapshot, and then update to the
+most recent version. Two individual rctx snapshots for a PidRef can safely
+generate an actual delta by way of `csrt_util:rctx_delta/2`.
+
+```
+-export([
+    accumulate_delta/1,
+    add_delta/2,
+    docs_written/1,
+    extract_delta/1,
+    get_delta/0,
+    inc/1,
+    inc/2,
+    ioq_called/0,
+    js_filtered/1,
+    make_delta/0,
+    rctx_delta/2,
+    maybe_add_delta/1,
+    maybe_add_delta/2,
+    maybe_inc/2,
+    should_track_init_p/1
+]).
+```
+
+## TODO: RPC/QUERY DOCS
+
+```
+%% RPC API
+-export([
+    rpc/2,
+    call/1
+]).
+
+%% Aggregate Query API
+-export([
+    active/0,
+    active/1,
+    active_coordinators/0,
+    active_coordinators/1,
+    active_workers/0,
+    active_workers/1,
+    count_by/1,
+    find_by_nonce/1,
+    find_by_pid/1,
+    find_by_pidref/1,
+    find_workers_by_pidref/1,
+    group_by/2,
+    group_by/3,
+    query/1,
+    query/2,
+    query_matcher/1,
+    query_matcher/2,
+    sorted/1,
+    sorted_by/1,
+    sorted_by/2,
+    sorted_by/3
+]).
+```
+
+## Recon API Ports of https://github.com/ferd/recon/releases/tag/2.5.6
+
+This is a "port" of `recon:proc_window` to `csrt:proc_window`, allowing for
+`proc_window` style aggregations/sorting/filtering but with the stats fields
+collected by CSRT! This is also a direct port of `recon:proc_window` in that it
+utilizes the same underlying logic and effecient internal data structures as
+`recon:proc_window`, but rather only changes the Sample function:
+
+```erlang
+%% This is a recon:proc_window/3 [1] port with the same core logic but
+%% recon_lib:proc_attrs/1 replaced with pid_ref_attrs/1, and returning on
+%% pid_ref() rather than pid().
+%% [1] 
https://github.com/ferd/recon/blob/c2a76855be3a226a3148c0dfc21ce000b6186ef8/src/recon.erl#L268-L300
+-spec proc_window(AttrName, Num, Time) -> term() | throw(any()) when
+    AttrName :: rctx_field(), Num :: non_neg_integer(), Time :: pos_integer().
+proc_window(AttrName, Num, Time) ->
+    Sample = fun() -> pid_ref_attrs(AttrName) end,
+    {First, Last} = recon_lib:sample(Time, Sample),
+    recon_lib:sublist_top_n_attrs(recon_lib:sliding_window(First, Last), Num).
+```
+
+In particular, our change is `Sample = fun() -> pid_ref_attrs(AttrName) end,`,
+and in fact, if recon upstream parameterized the option of `AttrName` or
+`SampleFunction`, this could be reimplemented as:
+
+```erlang
+%% csrt:proc_window
+proc_window(AttrName, Num, Time) ->
+    Sample = fun() -> pid_ref_attrs(AttrName) end,
+    recon:proc_window(Sample, Num, Time).
+```
+
+This implementation is being highlighted here because `recon:proc_window/3` is
+battle hardened and `recon_lib:sliding_window` uses an effecient internal data
+structure for storing the two samples that has been proven to work in 
production
+systems with millions of active processes, so swapping the `Sample` function
+with a CSRT version allows us to utilize the production grade recon
+functionality, but extended out to the particular CouchDB statistics we're
+esepecially interested in.
+
+And on a fun note: any further stats tracking fields added to CSRT tracking 
will
+automatically work with this too.
+
+
+```
+-export([
+    pid_ref_attrs/1,
+    pid_ref_matchspec/1,
+    proc_window/3
+]).
+```
+
+<hr />
+
+# Core types and Maybe types
+
+Before we look at the `#rctx{}` record fields, lets examine the core datatypes
+defined by CSRT for use in Dialyzer typespecs. There are more, but these are 
the
+essentials and demonstrate the "maybe" typespec approach utilized in CSRT.
+
+Let's say we have a `-type foo() :: #foo{}` and `-type maybe_foo() :: foo() |
+undefined`, we then can construct functions of the form `-spec get_foo(id()) ->
+maybe_foo()` and then we can use Dialyzer to statically assert all callers of
+`get_foo/1` handle the `maybe_foo()` data type rather than just `foo()` and
+ensure that all subsequent callers do as well.
+
+This approach of `-spec maybe_<Type> :: <Type> | undefined` is utilized
+throughout CSRT and has greatly aided in the development, refactoring, and
+static analysis of this system. Here's a useful snippet for running Dialyzer
+while hacking on CSRT:
+
+> make && time make dialyze apps=couch_stats
+
+```erlang
+-type pid_ref() :: {pid(), reference()}.
+-type maybe_pid_ref() :: pid_ref() | undefined.
+
+-type coordinator_rctx() :: #rctx{type :: coordinator()}.
+-type rpc_worker_rctx() :: #rctx{type :: rpc_worker()}.
+-type rctx() :: #rctx{} | coordinator_rctx() | rpc_worker_rctx().
+-type rctxs() :: [#rctx{}] | [].
+-type maybe_rctx() :: rctx() | undefined.
+```
+
+Above we have the core `pid_ref()` data type, which is just a tuple with a
+`pid()` and a `reference()`, and naturally, `maybe_pid_ref()` handles the
+optional presence of a `pid_ref()`, allowing for our APIs like
+`csrt:get_resource(maybe_pidref())` to handle ambiguity of the presence of a
+`pid_ref()`.
+
+We define our core `rctx()` data type as an empty `#rctx{}`, or the more
+specific `coordinator_rctx()` or `rpc_worker_rctx()` such that we can be
+specific about the `rctx()` type in functions that need to distinguish. And 
then
+as expected, we have the notion of `maybe_rctx()`.
+
+# #rctx{}
+
+This is the core data structure utilized to track a CSRT context for a
+coordinator or rpc_worker process, represented by the `#rctx{}` record, and
+stored in the `?CSRT_ETS` table keyed on `{keypos, #rctx.pid_ref}`.
+
+The Metadata fields store labeling data for the given process being tracked,
+such as started_at and updated_at timings, the primary `pid_ref` id key, the
+type of the process context, and some additional information like username,
+dbname, and the nonce of the coordinator request.
+
+The Stats Counters fields are `non_neg_integer()` monotonically increasing
+counters corresponding to the `couch_stats` metrics counters we're interested 
in
+tracking at a process level cardinality. The use of these purely integer 
counter
+fields represented by a record represented in an ets table is the cornerstone 
of
+CSRT and why its able to operate at high throughput and high concurrency, as
+`ets:update_counter/{3,4}` take increment operations to be performed atomically
+and in isolation, in a manner in which does not require fetching and loading 
the
+data directly. We then take care to batch the accumulation of delta updates 
into
+a single `update_counter` call and even sneak in the `updated_at` tracking as a
+integer counter update without inducing an extra ets call.
+
+NOTE: the typespec's for these fields include `'_'` atoms as possible types as
+that is the matchspec wildcard any of the fields can be set to when using an
+existing `#rctx{}` record to search with.
+
+
+```erlang
+-record(rctx, {
+    %% Metadata
+    started_at = csrt_util:tnow() :: integer() | '_',
+    %% NOTE: updated_at must be after started_at to preserve time congruity
+    updated_at = csrt_util:tnow() :: integer() | '_',
+    pid_ref :: maybe_pid_ref() | {'_', '_'} | '_',
+    nonce :: nonce() | undefined | '_',
+    type :: rctx_type() | undefined | '_',
+    dbname :: dbname() | undefined | '_',
+    username :: username() | undefined | '_',
+
+    %% Stats Counters
+    db_open = 0 :: non_neg_integer() | '_',
+    docs_read = 0 :: non_neg_integer() | '_',
+    docs_written = 0 :: non_neg_integer() | '_',
+    rows_read = 0 :: non_neg_integer() | '_',
+    changes_returned = 0 :: non_neg_integer() | '_',
+    ioq_calls = 0 :: non_neg_integer() | '_',
+    js_filter = 0 :: non_neg_integer() | '_',
+    js_filtered_docs = 0 :: non_neg_integer() | '_',
+    get_kv_node = 0 :: non_neg_integer() | '_',
+    get_kp_node = 0 :: non_neg_integer() | '_'
+    %% "Example to extend CSRT"
+    %%write_kv_node = 0 :: non_neg_integer() | '_',
+    %%write_kp_node = 0 :: non_neg_integer() | '_'
+}).
+
+## Metadata
+
+We use `csrt_util:tnow()` for time tracking, which is a `native` format
+`erlang:monotonic_time()` integer, which, noteably, _can_ be and is often a
+negative value. You must either take a delta or convert the time to get into a
+useable format, as one might suspect by the use of `native`.
+
+We make use of `erlang:mononotic_time/0` as per the recommendation in
+https://www.erlang.org/doc/apps/erts/time_correction.html#how-to-work-with-the-new-api
+for the suggested way to `Measure Elasped Time`, as quoted:
+
+```
+Take time stamps with erlang:monotonic_time/0 and calculate the time difference
+using ordinary subtraction. The result is in native time unit. If you want to
+convert the result to another time unit, you can use 
erlang:convert_time_unit/3.
+
+An easier way to do this is to use erlang:monotonic_time/1 with the desired 
time
+unit. However, you can then lose accuracy and precision.
+```
+
+So our `csrt_util:tnow/0` is implemented as the following, and we store
+timestamps in `native` format as long as possible to avoid precision loss at
+higher units of time, eg 300 microseconds is zero milliseconds.
+
+```
+-spec tnow() -> integer().
+tnow() ->
+    erlang:monotonic_time().
+```
+
+We store timestamps in the node's local erlang representation of time,
+specifically to be able to effeciently do time deltas, and then we track time
+deltas from the local node's perspective to not send timestamps across the 
wire.
+We then utilize `calendar:system_time_to_rfc3339` to convert the local node's
+native time representation to its corresponding time format when we generate 
the
+process life cycle reports or send an http response.
+
+NOTE: because we do an inline definition and assignment of the
+`#rctx.started_at` and `#rctx.updated_at` fields to `csrt_util:tnow()`, we
+_must_ declare `#rctx.updated_at` *after* `#rctx.started_at` to avoid
+fundamental time incongruenties.
+
+### #rctx.started_at = csrt_util:tnow() :: integer() | '_',
+
+A static value corresponding to the local node's Erlang monotonic_time at which
+this context was created.
+
+### #rctx.updated_at = csrt_util:tnow() :: integer() | '_',
+
+A dynamic value corresponding to the local node's Erlang monotonic_time at 
which
+this context was updated. Note: unlike `#rctx.started_at`, this value will
+update over time, and in the process lifecycle reports the `#rctx.updated_at`
+value corresponds to the point at which the context was destroyed, allowing for
+calculation of the total duration of the request/context.
+
+### #rctx.pid_ref :: maybe_pid_ref() | {'_', '_'} | '_',
+
+The primary identifier used to track the resources consumed by a given `pid()`
+for a specific context identified with a `make_ref()`, and combined together as
+unit as a given `pid()`, eg the `chttpd` worker pool, can have many contexts
+over time.
+
+### #rctx.nonce :: nonce() | undefined | '_',
+
+The `Nonce` value of the http request being serviced by the 
`coordinator_rctx()`
+used as the primary grouping identifier of workers across the cluster, as the
+`Nonce` is funneled through `rexi_server`.
+
+### #rctx.type :: rctx_type() | undefined | '_',
+
+A subtype classifier for the `#rctx{}` contexts, right now only supporting
+`#rpc_worker{}` and `#coordinator{}`, but CSRT was designed to accomodate
+additional context types like `#view_indexer{}`, `#search_indexer{}`,
+`#replicator{}`, `#compactor{}`, `#etc{}`.
+
+### #rctx.dbname :: dbname() | undefined | '_',
+
+The database name, filled in at some point after the initial context creation 
by
+way of `csrt:set_context_dbname/{1,2}`.
+
+### #rctx.username :: username() | undefined | '_',
+
+The requester's username, filled in at some point after the initial context
+creation by way of `csrt:set_context_username/{1,2}`.
+
+## Stats Counters
+
+All of these stats counters are stricly `non_neg_integer()` counter values that
+are monotonically increasing, as we only induce positive counter increment 
calls
+in CSRT. Not all of these values will be nonzero, eg if the context doesn't
+induce Javascript filtering of documents, it won't inc the `#rctx.js_filter`
+field. The `"should_truncate_reports"` config value described in this document
+will conditionally exclude the zero valued fields from being included in the
+process life cycle report.
+
+### #rctx.db_open = 0 :: non_neg_integer() | '_',
+
+> Tracking `couch_stats:increment_counter([couchdb, couch_server, open])
+
+The number of `couch_server:open/2` invocations induced by this context.
+
+### #rctx.docs_read = 0 :: non_neg_integer() | '_',
+
+> Tracking `couch_stats:increment_counter([couchdb, database_reads])
+
+The number of `couch_db:open_doc/3` invocations induced by this context.
+
+### #rctx.docs_written = 0 :: non_neg_integer() | '_',
+
+A phony metric counting docs written by the context, induced by
+`csrt:docs_written(length(Docs0)),` in `fabric_rpc:update_docs/3` as a way to
+count the magnitude of docs written, as the actual document writes happen in 
the
+`#db.main_pid` `couch_db_updater` pid and subprocess tracking is not yet
+supported in CSRT.
+
+This can be replaced with direct counting  once passthrough contexts work.
+
+### #rctx.rows_read = 0 :: non_neg_integer() | '_',
+
+> Tracking `couch_stats:increment_counter([fabric_rpc, changes, processed])
+> also Tracking `couch_stats:increment_counter([fabric_rpc, view, rows_read])
+
+A value tracking multiple possible metrics corresponding to rows streamed in
+aggregate operations. This is used for view_rows/changes_rows/all_docs/etc.
+
+### #rctx.changes_returned = 0 :: non_neg_integer() | '_',
+
+The number of `fabric_rpc:changes_row/2` invocations induced by this context,
+specifically tracking the number of changes rows streamed back to the client
+requeest, allowing for distinguishing between the number of changes processed 
to
+fulfill a request versus the number actually returned in the http response.
+
+### #rctx.ioq_calls = 0 :: non_neg_integer() | '_',
+
+A phony metric counting invocations of `ioq:call/3` induced by this context. As
+with `#rctx.docs_written`, we need a proxy metric to reperesent these calls
+until CSRT context passing is supported so that the `ioq_server` pid and return
+its own delta back to the worker pid.
+
+### #rctx.js_filter = 0 :: non_neg_integer() | '_',
+
+A phony metric counting the number of `couch_query_servers:filter_docs_int/5`
+(eg ddoc_prompt) invocations induced by this context. This is called by way of
+`csrt:js_filtered(length(JsonDocs))` which both increments `js_filter` by 1, 
and
+`js_filtered_docs` by the length of the docs so we can track magnitude of docs
+and doc revs being filtered.
+
+### #rctx.js_filtered_docs = 0 :: non_neg_integer() | '_',
+
+A phony metric counting the quantity of documents filtered by way of
+`couch_query_servers:filter_docs_int/5` (eg ddoc_prompt) invocations induced by
+this context. This is called by way of `csrt:js_filtered(length(JsonDocs))`
+which both increments `#rctx.js_filter` by 1, and `#rctx.js_filtered_docs` by
+the length of the docs so we can track magnitude of docs and doc revs being
+filtered.
+
+### #rctx.get_kv_node = 0 :: non_neg_integer() | '_',
+
+This metric tracks the number of invocations to `couch_btree:get_node/2` in
+which the `NodeType` returned by `couch_file:pread_term/2` is `kv_node`, 
instead
+of `kp_node`.
+
+This provides a mechanism to quantify the impact of document count and document
+size as those values become larger in the logarithmic complexity btree
+algorithms.  size on the logarithmic complexity btree algorithms as the 
database
+btrees grow.
+
+### #rctx.get_kp_node = 0 :: non_neg_integer() | '_'
+
+This metric tracks the number of invocations to `couch_btree:get_node/2` in
+which the `NodeType` returned by `couch_file:pread_term/2` is `kp_node`, 
instead
+of `kv_node`.
+
+This provides a mechanism to quantify the impact of document count and document
+size as those values become larger in the logarithmic complexity btree
+algorithms.  size on the logarithmic complexity btree algorithms as the 
database
+btrees grow.
+
+%% "Example to extend CSRT"
+%%write_kv_node = 0 :: non_neg_integer() | '_',
+%%write_kp_node = 0 :: non_neg_integer() | '_'
diff --git a/src/couch_stats/src/couch_stats_resource_tracker.hrl 
b/src/couch_stats/src/couch_stats_resource_tracker.hrl
index f4a8e37be..011cc07b8 100644
--- a/src/couch_stats/src/couch_stats_resource_tracker.hrl
+++ b/src/couch_stats/src/couch_stats_resource_tracker.hrl
@@ -119,7 +119,7 @@
     dbname :: dbname() | undefined | '_',
     username :: username() | undefined | '_',
 
-    %% Stats counters
+    %% Stats Counters
     db_open = 0 :: non_neg_integer() | '_',
     docs_read = 0 :: non_neg_integer() | '_',
     docs_written = 0 :: non_neg_integer() | '_',
diff --git a/src/couch_stats/src/csrt_logger.erl 
b/src/couch_stats/src/csrt_logger.erl
index b7d69356f..6f1588738 100644
--- a/src/couch_stats/src/csrt_logger.erl
+++ b/src/couch_stats/src/csrt_logger.erl
@@ -121,7 +121,7 @@ tracker({Pid, _Ref} = PidRef) ->
 -spec register_matcher(Name, MSpec) -> ok | {error, badarg} when
     Name :: string(), MSpec :: ets:match_spec().
 register_matcher(Name, MSpec) ->
-    gen_server:call(?MODULE, {register, Name, MSpec}).
+    gen_server:call(?MODULE, {register, Name, MSpec}, infinity).
 
 -spec deregister_matcher(Name :: string()) -> ok.
 deregister_matcher(Name) ->
@@ -362,9 +362,13 @@ matcher_on_nonce(Nonce) ->
 matcher_on_changes_processed(Threshold) when
     is_integer(Threshold) andalso Threshold > 0
 ->
+    %% HACK: because we overload the use of #rctx.rows_read for
+    %% changes_processed, we must specify a direct match against a changes
+    %% context. Fow now, just match on #coordinator's
     ets:fun2ms(
         fun(
             #rctx{
+                type = #coordinator{mod = chttpd_db, func = 
handle_changes_req},
                 rows_read = Processed,
                 changes_returned = Returned
             } = R
diff --git a/src/couch_stats/src/csrt_util.erl 
b/src/couch_stats/src/csrt_util.erl
index c19c10d6f..acfd12ee4 100644
--- a/src/couch_stats/src/csrt_util.erl
+++ b/src/couch_stats/src/csrt_util.erl
@@ -197,6 +197,8 @@ convert_ref(Ref) when is_reference(Ref) ->
 -spec convert_string(Str :: string() | binary() | undefined) -> binary() | 
null.
 convert_string(undefined) ->
     null;
+convert_string(<<"undefined">>) ->
+    null;
 convert_string(Str) when is_list(Str) ->
     list_to_binary(Str);
 convert_string(Bin) when is_binary(Bin) ->

(couchdb) branch couch-stats-resource-tracker-v3-rebase-http updated: WIP: CSRT.md and some cleanup

Reply via email to