[storm] branch master updated: STORM-3791 update metric documentation (#3409)

agresch Wed, 18 Aug 2021 07:50:05 -0700

This is an automated email from the ASF dual-hosted git repository.

agresch pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/storm.git



The following commit(s) were added to refs/heads/master by this push:
     new 0787d86  STORM-3791 update metric documentation (#3409)
0787d86 is described below

commit 0787d86010d83269b616e6f09c70ee45784b06d8
Author: agresch <[email protected]>
AuthorDate: Wed Aug 18 09:49:38 2021 -0500

    STORM-3791 update metric documentation (#3409)
    
    * STORM-3791 update metric documentation
---
 docs/ClusterMetrics.md    |  2 ++
 docs/LocalityAwareness.md |  2 +-
 docs/Metrics.md           | 24 ++++++++++--------------
 docs/metrics_v2.md        |  1 +
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/docs/ClusterMetrics.md b/docs/ClusterMetrics.md
index 6626054..09955fd 100644
--- a/docs/ClusterMetrics.md
+++ b/docs/ClusterMetrics.md
@@ -180,11 +180,13 @@ Metrics associated with the supervisor, which launches 
the workers for a topolog
 | supervisor:blob-localization-duration | timer | Approximately how long it 
takes to get the blob we want after it is requested. |
 | supervisor:current-reserved-memory-mb | gauge | total amount of memory 
reserved for workers on the supervisor (MB) |
 | supervisor:current-used-memory-mb | gauge | memory currently used as 
measured by the supervisor (this typically requires cgroups) (MB) |
+| supervisor:health-check-timeouts | meter | tracks timeouts executing health 
check scripts |
 | supervisor:local-resource-file-not-found-when-releasing-slot | meter | 
number of times file-not-found exception happens when reading local blobs upon 
releasing slots |
 | supervisor:num-blob-update-version-changed | meter | number of times a 
version of a blob changes. |
 | supervisor:num-cleanup-exceptions | meter | exceptions thrown during 
container cleanup. |
 | supervisor:num-force-kill-exceptions | meter | exceptions thrown during 
force kill. |
 | supervisor:num-kill-exceptions | meter | exceptions thrown during kill. |
+| supervisor:num-kill-worker-errors | meter | errors killing workers. |
 | supervisor:num-launched | meter | number of times the supervisor is 
launched. |
 | supervisor:num-shell-exceptions | meter | number of exceptions calling shell 
commands. |
 | supervisor:num-slots-used-gauge | gauge | number of slots used on the 
supervisor. |
diff --git a/docs/LocalityAwareness.md b/docs/LocalityAwareness.md
index 0bcd1af..517f1c7 100644
--- a/docs/LocalityAwareness.md
+++ b/docs/LocalityAwareness.md
@@ -48,7 +48,7 @@ If the downstream executor located on the same worker as the 
executor `E`, the l
 The capacity of a bolt executor on Storm UI is calculated as:
   * (number executed * average execute latency) / measurement time
 
-It basically means how busy this executor is. If this is around 1.0, the 
corresponding Bolt is running as fast as it can. 
+It basically means how busy this executor is. If this is around 1.0, the 
corresponding Bolt is running as fast as it can. A `__capacity` metric exists 
to track this value for each executor.
 
 The `Capacity` is not related to the `Load`:
 
diff --git a/docs/Metrics.md b/docs/Metrics.md
index 9d21f50..11e5f27 100644
--- a/docs/Metrics.md
+++ b/docs/Metrics.md
@@ -213,37 +213,33 @@ This metric records how many errors were reported by a 
spout/bolt. It is the tot
 
 #### Queue Metrics
 
-Each bolt or spout instance in a topology has a receive queue and a send 
queue.  Each worker also has a queue for sending messages to other workers.  
All of these have metrics that are reported.
+Each bolt or spout instance in a topology has a receive queue.  Each worker 
also has a worker transfer queue for sending messages to other workers.  All of 
these have metrics that are reported.
 
-The receive queue metrics are reported under the `__receive` name and send 
queue metrics are reported under the `__sendqueue` for the given bolt/spout 
they are a part of.  The metrics for the queue that sends messages to other 
workers is under the `__transfer` metric name for the system bolt (`__system`).
+The receive queue metrics are reported under the `receive_queue` name.  The 
metrics for the queue that sends messages to other workers is under the 
`worker-transfer-queue` metric name for the system bolt (`__system`).
 
-They all have the form.
+These queues report the following metrics:
 
 ```
 {
     "arrival_rate_secs": 1229.1195171893523,
     "overflow": 0,
-    "read_pos": 103445,
-    "write_pos": 103448,
     "sojourn_time_ms": 2.440771591407277,
     "capacity": 1024,
-    "population": 19
-    "tuple_population": 200
+    "population": 19,
+    "pct_full": "0.018".
+    "insert_failures": "0",
+    "dropped_messages": "0"
 }
 ```
-In storm we sometimes batch multiple tuples into a single entry in the 
disruptor queue. This batching is an optimization that has been in storm in 
some form since the beginning, but the metrics did not always reflect this so 
be careful with how you interpret the metrics and pay attention to which 
metrics are for tuples and which metrics are for entries in the disruptor 
queue. The `__receive` and `__transfer` queues can have batching but the 
`__sendqueue` should not.
 
 `arrival_rate_secs` is an estimation of the number of tuples that are inserted 
into the queue in one second, although it is actually the dequeue rate.
 The `sojourn_time_ms` is calculated from the arrival rate and is an estimate 
of how many milliseconds each tuple sits in the queue before it is processed.
-Prior to STORM-2621 (v1.1.1, v1.2.0, and v2.0.0) these were the rate of 
entries, not of tuples.
 
-A disruptor queue has a set maximum number of entries.  If the regular queue 
fills up an overflow queue takes over.  The number of tuple batches stored in 
this overflow section are represented by the `overflow` metric.  Storm also 
does some micro batching of tuples for performance/efficiency reasons so you 
may see the overflow with a very small number in it even if the queue is not 
full.
+The queue has a set maximum number of entries.  If the regular queue fills up 
an overflow queue takes over.  The number of tuples stored in this overflow 
section are represented by the `overflow` metric.  Note that an overflow queue 
is only used for executors to receive tuples from remote workers. It doesn't 
apply to intra-worker tuple transfer.
 
-`read_pos` and `write_pos` are internal disruptor accounting numbers.  You can 
think of them almost as the total number of entries written (`write_pos`) or 
read (`read_pos`) since the queue was created.  They allow for integer overflow 
so if you use them please take that into account.
+`capacity` is the maximum number of entries in the queue. `population` is the 
number of entries currently filled in the queue. 'pct_full' tracks the 
percentage of capacity in use.
 
-`capacity` is the maximum number of entries in the disruptor queue. 
`population` is the number of entries currently filled in the queue.
-
-`tuple_population` is the number of tuples currently in the queue as opposed 
to the number of entries.  This was added at the same time as STORM-2621 
(v1.1.1, v1.2.0, and v2.0.0)
+'insert_failures' tracks the number of failures inserting into the queue. 
'dropped_messages' tracks messages dropped due to the overflow queue being full.
 
 #### System Bolt (Worker) Metrics
 
diff --git a/docs/metrics_v2.md b/docs/metrics_v2.md
index c46459d..d65635c 100644
--- a/docs/metrics_v2.md
+++ b/docs/metrics_v2.md
@@ -145,6 +145,7 @@ to using the long metric name, but can report the short 
name by configuring repo
 ## Backwards Compatibility Notes
 
 1. V2 metrics can also be reported to the Metrics Consumers registered with 
`topology.metrics.consumer.register` by enabling the 
`topology.enable.v2.metrics.tick` configuration.
+The rate that they will reported to Metric Consumers is controlled by 
`topology.v2.metrics.tick.interval.seconds`, defaulting to every 60 seconds.
 
 2. Starting from storm 2.3, the config `storm.metrics.reporters` is deprecated 
in favor of `topology.metrics.reporters`.

[storm] branch master updated: STORM-3791 update metric documentation (#3409)

Reply via email to