nickva commented on code in PR #5602: URL: https://github.com/apache/couchdb/pull/5602#discussion_r2317584655
########## src/docs/src/config/csrt.rst: ########## @@ -0,0 +1,587 @@ +.. Licensed under the Apache License, Version 2.0 (the "License"); you may not +.. use this file except in compliance with the License. You may obtain a copy of +.. the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +.. License for the specific language governing permissions and limitations under +.. the License. + +.. default-domain:: config +.. highlight:: ini + +.. _config-csrt: + +========================================== +Couch Stats Resource Tracker (CSRT) Config +========================================== + +CSRT configuration options and overview. + +.. seealso:: + + :doc:`/csrt/index` + +CSRT config +=========== + +This section contains the top level enablement and configuration options for CSRT. + +.. config:section:: csrt :: CSRT Primary Configuration + + .. config:option:: enable :: Enable CSRT data collection and RPC deltas + + Core enablement toggle for CSRT, defaults to false. Enabling this + setting initiates local CSRT stats collection as well as shipping deltas + in RPC responses to accumulate in the coordinator. + + This does *not* trigger the new RPC spawn metrics, and it does not + enable reporting for any of the rctx types. + + .. warning:: + + You *MUST* have all nodes in the cluster running a CSRT aware + CouchDB *before* you enable it on any node, otherwise the old + version nodes won't know how to handle the new RPC formats + including an embedded Delta payload. + + Top level CSRT enablement for local data collection and RPC deltas:: + + [csrt] + enable = false + + .. config:option:: enable_init_p :: Enable RPC spawn metric tracking + + Enablement of tracking new metric counters for different `fabric_rpc` operations + spawned by way of `rexi_server:init_p/3`. This is the primary mechanism for + inducing database RPC operations within CouchDB, and these init_p metrics aim to + provide node level understandings of the workloads being induced by other + coordinator processes. This is especially relevant for databases on subsets of a + cluster resulting in non-uniform workloads, these metrics are tailored to + provide insight into what work is being spawned on each node in the cluster as a + function of time. + + Enablement for tracking counts of spawned RPC workers:: + + [csrt] + enable_init_p = false + + .. config:option:: enable_reporting :: Enable CSRT Process Lifecyle Reports + + This is the primary toggle for enabling CSRT process lifetime reports + containing detailed information about the quantity of work induced by + the given request/worker/etc. This is the top level toggle for enabling + _any_ reporting, and there also exists + :config:option:`csrt/enable_rpc_reporting` to disable the reporting of + any individual RPC workers, leaving the coordinator responsible of + generating a report with the accumulated deltas. + + .. note:: + + Note that this setting toggles whether or not to generate process + lifecycle reports, but no reports will be generated until logger + matchers have been enabled that trigger a match on CSRT contexts + that have surpassed the configured thresholds. + + Top level toggle for whether any process lifecycle reports are generated:: + + [csrt] + enable_reporting = false + + .. config:option:: enable_rpc_reporting :: Enable RPC process lifecyle reports + + This enables the possibility of RPC workers generating reports. They + still need to hit the configured thresholds to induce a report, but + this will generate CSRT process lifetime reports for individual RPC + workers that trigger the configured logger thresholds. This allows for + quantifying per node resource usage when desired, as otherwise the + reports are at the http request level and don't provide per node stats. + + The key idea here is that having RPC level CSRT process lifetime + reporting is incredibly useful, but can also generate large quantities + of data. For example, a view query on a Q=64 database will stream + results from 64 shard replicas, resulting in at least 64 RPC reports, + plus any that might have been generated from RPC workers that "lost" + the race for shard replica. This is very useful, but a lot of data + given the verbose nature of funneling it through the RSyslog reports, + however, the ability to write directly to something like ClickHouse or + another columnar store would be great. + + Until there's an efficient storage mechanism to stream the results to, + the rsyslog entries work great and are very practical, but care must be + taken to not generate too much data for aggregate queries as they + generate at least `Qx` more report than an individual report per http + request from the coordinator. This setting exists as a way to either + a) utilize the logger matcher configured thresholds to allow for _any_ + rctx's to be recorded when they induce heavy operations, either + Coordinator or RPC worker; or b) to _only_ log workloads at the + coordinator level. + + .. note:: + + This setting exists because we lack an expressive enough config + declaration to easily chain the matchspec constructions as + `ets:fun2ms/1` is a special compile time parse transform macro that + requires the full definition to be specified directly, it cannot + be interactively constructed. That said, you _can_ register matchers + through `remsh` with more specific and fine grained pattern matching, + and a more expressive system for defining matchers are being + explored. + + .. warning:: + + Enabling this setting *will* generate considerably more logs! Specifically, for aggregate queries and database operations, this will generate `Q` * `N` times more logs than a singular doc request taking only `N` inreacting with a singular shard range. See the note above about this being a temporary setting during the experimental stages of CSRT. + + Toggle to enable possibility of RPC process lifecycle reports:: + + [csrt] + enable_rpc_reporting = false + + .. config:option:: should_truncate_reports :: truncate zero values from lifecyle reports + + enables truncation of the csrt process lifetime reports to not include + any fields that are zero at the end of process lifetime, eg don't + include `js_filter=0` in the report if the request did not induce + javascript filtering. + + this can be disabled if you really care about consistent fields in the + report logs, but this is a log space saving mechanism, similar to + disabling rpc reporting by default, as its a simple way to reduce + overall volume + + Truncate zero values from process lifecycle reports, enabled by default: + + [csrt] + should_truncate_reports = true + + .. config:option:: query_limit :: Maximum quantity of rows to return in CSRT query/http requests. + + Limit the quantity of rows that can be loaded in an http query.:: + + [csrt] + query_limit = 100 + + .. config:option:: query_cardinality_limit :: Maximum quantity of rows to allow in CSRT query/http requests. + + Limit the quantity of rows that can be loaded in an http query.:: + + [csrt] + query_cardinality_limit = 10000 + +.. _csrt-logger-matcher-configuration: + +CSRT Logger Matcher Configuration +================================= + +There are currently eight builtin default logger matchers designed to make it +easy to do filtering on heavy resource usage inducing and long running +requests. These are designed as a simple baseline of useful matchers, declared +in a manner amenable to `default.ini` based constructs. More expressive matcher +declarations are being explored, and matchers of arbitrary complexity can be +registered directly through `remsh`. The default matchers are all designed around +an integer config Threshold that triggers on a specific field, eg docs read, or +on a delta of fields for long requests and changes requests that process many +rows but return few. + +The current default matchers are: + + * `all_coordinators`: match all Coordinators handling HTTP requests + + * :config:option:`Enable <csrt_logger.matchers_enabled/all_coordinators>` | none + + * `all_rpc_workers`: match all RPC Worker handling internal requests + + * :config:option:`Enable <csrt_logger.matchers_enabled/all_rpc_workers>` | none + + * `docs_read`: match all requests reading more than N docs + + * :config:option:`Enable <csrt_logger.matchers_enabled/docs_read>` | :config:option:`Threshold <csrt_logger.matchers_threshold/docs_read>` + + * `rows_read`: match all requests reading more than N rows + + * :config:option:`Enable <csrt_logger.matchers_enabled/rows_read>` | :config:option:`Threshold <csrt_logger.matchers_threshold/rows_read>` + + * `docs_written`: match all requests writing more than N docs + + * :config:option:`Enable <csrt_logger.matchers_enabled/docs_written>` | :config:option:`Threshold <csrt_logger.matchers_threshold/docs_written>` + + * `ioq_calls`: match all requests inducing more than N ioq_calls + + * :config:option:`Enable <csrt_logger.matchers_enabled/ioq_calls>` | :config:option:`Threshold <csrt_logger.matchers_threshold/ioq_calls>` + + * `long_reqs`: match all requests lasting more than N milliseconds + + * :config:option:`Enable <csrt_logger.matchers_enabled/long_reqs>` | :config:option:`Threshold <csrt_logger.matchers_threshold/long_reqs>` + + * `changes_processed`: match all changes requests that returned at least N rows + less than was necessarily loaded to complete the request (eg find heavy + filtered changes requests reading many rows but returning few). + + * :config:option:`Enable <csrt_logger.matchers_enabled/changes_processed>` | :config:option:`Threshold <csrt_logger.matchers_threshold/changes_processed>` + +Each of the default matchers has an enablement setting in +ref:`csrt-logger-matcher-configuration-enablement` for toggling enablement of Review Comment: May need another `:` in front as it doesn't render as a link reference currently. <img width="574" height="106" alt="Screenshot 2025-09-02 at 10 11 07 PM" src="https://github.com/user-attachments/assets/7deaeee1-2dd3-42a3-b972-248f4c4f3507" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org