This is an automated email from the ASF dual-hosted git repository.
agresch pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/storm.git
The following commit(s) were added to refs/heads/master by this push:
new 0787d86 STORM-3791 update metric documentation (#3409)
0787d86 is described below
commit 0787d86010d83269b616e6f09c70ee45784b06d8
Author: agresch <[email protected]>
AuthorDate: Wed Aug 18 09:49:38 2021 -0500
STORM-3791 update metric documentation (#3409)
* STORM-3791 update metric documentation
---
docs/ClusterMetrics.md | 2 ++
docs/LocalityAwareness.md | 2 +-
docs/Metrics.md | 24 ++++++++++--------------
docs/metrics_v2.md | 1 +
4 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/docs/ClusterMetrics.md b/docs/ClusterMetrics.md
index 6626054..09955fd 100644
--- a/docs/ClusterMetrics.md
+++ b/docs/ClusterMetrics.md
@@ -180,11 +180,13 @@ Metrics associated with the supervisor, which launches
the workers for a topolog
| supervisor:blob-localization-duration | timer | Approximately how long it
takes to get the blob we want after it is requested. |
| supervisor:current-reserved-memory-mb | gauge | total amount of memory
reserved for workers on the supervisor (MB) |
| supervisor:current-used-memory-mb | gauge | memory currently used as
measured by the supervisor (this typically requires cgroups) (MB) |
+| supervisor:health-check-timeouts | meter | tracks timeouts executing health
check scripts |
| supervisor:local-resource-file-not-found-when-releasing-slot | meter |
number of times file-not-found exception happens when reading local blobs upon
releasing slots |
| supervisor:num-blob-update-version-changed | meter | number of times a
version of a blob changes. |
| supervisor:num-cleanup-exceptions | meter | exceptions thrown during
container cleanup. |
| supervisor:num-force-kill-exceptions | meter | exceptions thrown during
force kill. |
| supervisor:num-kill-exceptions | meter | exceptions thrown during kill. |
+| supervisor:num-kill-worker-errors | meter | errors killing workers. |
| supervisor:num-launched | meter | number of times the supervisor is
launched. |
| supervisor:num-shell-exceptions | meter | number of exceptions calling shell
commands. |
| supervisor:num-slots-used-gauge | gauge | number of slots used on the
supervisor. |
diff --git a/docs/LocalityAwareness.md b/docs/LocalityAwareness.md
index 0bcd1af..517f1c7 100644
--- a/docs/LocalityAwareness.md
+++ b/docs/LocalityAwareness.md
@@ -48,7 +48,7 @@ If the downstream executor located on the same worker as the
executor `E`, the l
The capacity of a bolt executor on Storm UI is calculated as:
* (number executed * average execute latency) / measurement time
-It basically means how busy this executor is. If this is around 1.0, the
corresponding Bolt is running as fast as it can.
+It basically means how busy this executor is. If this is around 1.0, the
corresponding Bolt is running as fast as it can. A `__capacity` metric exists
to track this value for each executor.
The `Capacity` is not related to the `Load`:
diff --git a/docs/Metrics.md b/docs/Metrics.md
index 9d21f50..11e5f27 100644
--- a/docs/Metrics.md
+++ b/docs/Metrics.md
@@ -213,37 +213,33 @@ This metric records how many errors were reported by a
spout/bolt. It is the tot
#### Queue Metrics
-Each bolt or spout instance in a topology has a receive queue and a send
queue. Each worker also has a queue for sending messages to other workers.
All of these have metrics that are reported.
+Each bolt or spout instance in a topology has a receive queue. Each worker
also has a worker transfer queue for sending messages to other workers. All of
these have metrics that are reported.
-The receive queue metrics are reported under the `__receive` name and send
queue metrics are reported under the `__sendqueue` for the given bolt/spout
they are a part of. The metrics for the queue that sends messages to other
workers is under the `__transfer` metric name for the system bolt (`__system`).
+The receive queue metrics are reported under the `receive_queue` name. The
metrics for the queue that sends messages to other workers is under the
`worker-transfer-queue` metric name for the system bolt (`__system`).
-They all have the form.
+These queues report the following metrics:
```
{
"arrival_rate_secs": 1229.1195171893523,
"overflow": 0,
- "read_pos": 103445,
- "write_pos": 103448,
"sojourn_time_ms": 2.440771591407277,
"capacity": 1024,
- "population": 19
- "tuple_population": 200
+ "population": 19,
+ "pct_full": "0.018".
+ "insert_failures": "0",
+ "dropped_messages": "0"
}
```
-In storm we sometimes batch multiple tuples into a single entry in the
disruptor queue. This batching is an optimization that has been in storm in
some form since the beginning, but the metrics did not always reflect this so
be careful with how you interpret the metrics and pay attention to which
metrics are for tuples and which metrics are for entries in the disruptor
queue. The `__receive` and `__transfer` queues can have batching but the
`__sendqueue` should not.
`arrival_rate_secs` is an estimation of the number of tuples that are inserted
into the queue in one second, although it is actually the dequeue rate.
The `sojourn_time_ms` is calculated from the arrival rate and is an estimate
of how many milliseconds each tuple sits in the queue before it is processed.
-Prior to STORM-2621 (v1.1.1, v1.2.0, and v2.0.0) these were the rate of
entries, not of tuples.
-A disruptor queue has a set maximum number of entries. If the regular queue
fills up an overflow queue takes over. The number of tuple batches stored in
this overflow section are represented by the `overflow` metric. Storm also
does some micro batching of tuples for performance/efficiency reasons so you
may see the overflow with a very small number in it even if the queue is not
full.
+The queue has a set maximum number of entries. If the regular queue fills up
an overflow queue takes over. The number of tuples stored in this overflow
section are represented by the `overflow` metric. Note that an overflow queue
is only used for executors to receive tuples from remote workers. It doesn't
apply to intra-worker tuple transfer.
-`read_pos` and `write_pos` are internal disruptor accounting numbers. You can
think of them almost as the total number of entries written (`write_pos`) or
read (`read_pos`) since the queue was created. They allow for integer overflow
so if you use them please take that into account.
+`capacity` is the maximum number of entries in the queue. `population` is the
number of entries currently filled in the queue. 'pct_full' tracks the
percentage of capacity in use.
-`capacity` is the maximum number of entries in the disruptor queue.
`population` is the number of entries currently filled in the queue.
-
-`tuple_population` is the number of tuples currently in the queue as opposed
to the number of entries. This was added at the same time as STORM-2621
(v1.1.1, v1.2.0, and v2.0.0)
+'insert_failures' tracks the number of failures inserting into the queue.
'dropped_messages' tracks messages dropped due to the overflow queue being full.
#### System Bolt (Worker) Metrics
diff --git a/docs/metrics_v2.md b/docs/metrics_v2.md
index c46459d..d65635c 100644
--- a/docs/metrics_v2.md
+++ b/docs/metrics_v2.md
@@ -145,6 +145,7 @@ to using the long metric name, but can report the short
name by configuring repo
## Backwards Compatibility Notes
1. V2 metrics can also be reported to the Metrics Consumers registered with
`topology.metrics.consumer.register` by enabling the
`topology.enable.v2.metrics.tick` configuration.
+The rate that they will reported to Metric Consumers is controlled by
`topology.v2.metrics.tick.interval.seconds`, defaulting to every 60 seconds.
2. Starting from storm 2.3, the config `storm.metrics.reporters` is deprecated
in favor of `topology.metrics.reporters`.