Repository: mesos Updated Branches: refs/heads/master 6b00c3243 -> f16d73852
Added documentation on monitoring metrics and alerts. Review: https://reviews.apache.org/r/33241 Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/f16d7385 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/f16d7385 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/f16d7385 Branch: refs/heads/master Commit: f16d73852623ee05cc13d2757115f7815e608964 Parents: 6b00c32 Author: Ricardo Cervera-Navarro <[email protected]> Authored: Wed Jun 24 12:01:39 2015 -0700 Committer: Vinod Kone <[email protected]> Committed: Wed Jun 24 12:01:39 2015 -0700 ---------------------------------------------------------------------- docs/home.md | 1 + docs/monitoring.md | 1057 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1058 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/f16d7385/docs/home.md ---------------------------------------------------------------------- diff --git a/docs/home.md b/docs/home.md index d990cbe..bc27791 100644 --- a/docs/home.md +++ b/docs/home.md @@ -20,6 +20,7 @@ layout: documentation * [Logging and Debugging](/documentation/latest/logging-and-debugging/) for viewing Mesos and framework logs. * [High Availability](/documentation/latest/high-availability/) for running multiple masters simultaneously. * [Operational Guide](/documentation/latest/operational-guide/) +* [Monitoring](/documentation/latest/monitoring/) * [Network Monitoring](/documentation/latest/network-monitoring/) * [Slave Recovery](/documentation/latest/slave-recovery/) for doing seamless upgrades. * [Tools](/documentation/latest/tools/) for setting up and running a Mesos cluster. http://git-wip-us.apache.org/repos/asf/mesos/blob/f16d7385/docs/monitoring.md ---------------------------------------------------------------------- diff --git a/docs/monitoring.md b/docs/monitoring.md new file mode 100644 index 0000000..d80f936 --- /dev/null +++ b/docs/monitoring.md @@ -0,0 +1,1057 @@ +--- +layout: documentation +--- + + +# Mesos Observability Metrics + +This document describes the observability metrics provided by Mesos master and +slave nodes. This document also provides some initial guidance on which metrics +you should monitor to detect abnormal situations in your cluster. + + +## Overview + +Mesos master and slave nodes report a set of statistics and metrics that enable +you to monitor resource usage and detect abnormal situations early. The +information reported by Mesos includes details about available resources, used +resources, registered frameworks, active slaves, and task state. You can use +this information to create automated alerts and to plot different metrics over +time inside a monitoring dashboard. + + +## Metric Types + +Mesos provides two different kinds of metrics: counters and gauges. + +**Counters** keep track of discrete events and are monotonically increasing. The +value of a metric of this type is always a natural number. Examples include the +number of failed tasks and the number of slave registrations. For some metrics +of this type, the rate of change is often more useful than the value itself. + +**Gauges** represent an instantaneous sample of some magnitude. Examples include +the amount of used memory in the cluster and the number of connected slaves. For +some metrics of this type, it is often useful to determine whether the value is +above or below a threshold for a sustained period of time. + +The tables in this document indicate the type of each available metric. + + +## Master Nodes + +Metrics from the master node are available at the following URL: + + http://<mesos-master-ip>:5050/metrics/snapshot + +The response is a JSON object that contains metrics names and values as +key-value pairs. + +### Observability metrics + +This section lists all available metrics from Mesos master nodes grouped by +category. + +#### Resources + +The following metrics provide information about the total resources available in +the cluster and their current usage. High resource usage for sustained periods +of time may indicate that you need to add capacity to your cluster or that a +framework is misbehaving. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/cpus_percent</code> + </td> + <td>Percentage of allocated CPUs</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/cpus_used</code> + </td> + <td>Number of allocated CPUs</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/cpus_total</code> + </td> + <td>Number of CPUs</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/disk_percent</code> + </td> + <td>Percentage of allocated disk space</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/disk_used</code> + </td> + <td>Allocated disk space in MB</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/disk_total</code> + </td> + <td>Disk space in MB</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/mem_percent</code> + </td> + <td>Percentage of allocated memory</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/mem_used</code> + </td> + <td>Allocated memory in MB</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/mem_total</code> + </td> + <td>Memory in MB</td> + <td>Gauge</td> +</tr> +</table> + +#### Master + +The following metrics provide information about whether a master is currently +elected and how long it has been running. A cluster with no elected master +for sustained periods of time indicates a malfunctioning cluster. This +points to either leadership election issues (so check the connection to +ZooKeeper) or a flapping Master process. A low uptime value indicates that the +master has restarted recently. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/elected</code> + </td> + <td>Whether this is the elected master</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/uptime_secs</code> + </td> + <td>Uptime in seconds</td> + <td>Gauge</td> +</tr> +</table> + +#### System + +The following metrics provide information about the resources available on this +master node and their current usage. High resource usage in a master node for +sustained periods of time may degrade the performance of the cluster. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>system/cpus_total</code> + </td> + <td>Number of CPUs available in this master node</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/load_15min</code> + </td> + <td>Load average for the past 15 minutes</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/load_5min</code> + </td> + <td>Load average for the past 5 minutes</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/load_1min</code> + </td> + <td>Load average for the past minute</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/mem_free_bytes</code> + </td> + <td>Free memory in bytes</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/mem_total_bytes</code> + </td> + <td>Total memory in bytes</td> + <td>Gauge</td> +</tr> +</table> + +#### Slaves + +The following metrics provide information about slave events, slave counts, and +slave states. A low number of active slaves may indicate that slaves are +unhealthy or that they are not able to connect to the elected master. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/slave_registrations</code> + </td> + <td>Number of slaves that were able to cleanly re-join the cluster and + connect back to the master after the master is disconnected.</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/slave_removals</code> + </td> + <td>Number of slave removed for various reasons, including maintenance</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/slave_reregistrations</code> + </td> + <td>Number of slave re-registrations</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/slave_shutdowns_scheduled</code> + </td> + <td>Number of slaves which have failed their health check and are scheduled + to be removed. They will not be immediately removed due to the Slave + Removal Rate-Limit, but <code>master/slave_shutdowns_completed</code> + will start increasing as they do get removed.</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/slave_shutdowns_cancelled</code> + </td> + <td>Number of cancelled slave shutdowns. This happens when the slave removal + rate limit allows for a slave to reconnect and send a <code>PONG</code> + to the master before being removed.</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/slave_shutdowns_completed</code> + </td> + <td>Number of slaves that failed their health check. These are slaves which + were not heard from despite the slave-removal rate limit, and have been + removed from the master's slave registry.</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/slaves_active</code> + </td> + <td>Number of active slaves</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/slaves_connected</code> + </td> + <td>Number of connected slaves</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/slaves_disconnected</code> + </td> + <td>Number of disconnected slaves</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/slaves_inactive</code> + </td> + <td>Number of inactive slaves</td> + <td>Gauge</td> +</tr> +</table> + +#### Frameworks + +The following metrics provide information about the registered frameworks in the +cluster. No active or connected frameworks may indicate that a scheduler is not +registered or that it is misbehaving. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/frameworks_active</code> + </td> + <td>Number of active frameworks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/frameworks_connected</code> + </td> + <td>Number of connected frameworks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/frameworks_disconnected</code> + </td> + <td>Number of disconnected frameworks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/frameworks_inactive</code> + </td> + <td>Number of inactive frameworks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/outstanding_offers</code> + </td> + <td>Number of outstanding resource offers</td> + <td>Gauge</td> +</tr> +</table> + +#### Tasks + +The following metrics provide information about active and terminated tasks. A +high rate of lost tasks may indicate that there is a problem with the cluster. +The task states listed here match those of the task state machine. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/tasks_error</code> + </td> + <td>Number of tasks that were invalid</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/tasks_failed</code> + </td> + <td>Number of failed tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/tasks_finished</code> + </td> + <td>Number of finished tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/tasks_killed</code> + </td> + <td>Number of killed tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/tasks_lost</code> + </td> + <td>Number of lost tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/tasks_running</code> + </td> + <td>Number of running tasks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/tasks_staging</code> + </td> + <td>Number of staging tasks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/tasks_starting</code> + </td> + <td>Number of starting tasks</td> + <td>Gauge</td> +</tr> +</table> + +#### Messages + +The following metrics provide information about messages between the master and +the slaves and between the framework and the executors. A high rate of dropped +messages may indicate that there is a problem with the network. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/invalid_framework_to_executor_messages</code> + </td> + <td>Number of invalid framework to executor messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/invalid_status_update_acknowledgements</code> + </td> + <td>Number of invalid status update acknowledgements</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/invalid_status_updates</code> + </td> + <td>Number of invalid status updates</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/dropped_messages</code> + </td> + <td>Number of dropped messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_authenticate</code> + </td> + <td>Number of authentication messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_deactivate_framework</code> + </td> + <td>Number of framework deactivation messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_exited_executor</code> + </td> + <td>Number of terminated executor messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_framework_to_executor</code> + </td> + <td>Number of messages from a framework to an executor</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_kill_task</code> + </td> + <td>Number of kill task messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_launch_tasks</code> + </td> + <td>Number of launch task messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_reconcile_tasks</code> + </td> + <td>Number of reconcile task messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_register_framework</code> + </td> + <td>Number of framework registration messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_register_slave</code> + </td> + <td>Number of slave registration messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_reregister_framework</code> + </td> + <td>Number of framework re-registration messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_reregister_slave</code> + </td> + <td>Number of slave re-registration messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_resource_request</code> + </td> + <td>Number of resource request messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_revive_offers</code> + </td> + <td>Number of offer revival messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_status_udpate</code> + </td> + <td>Number of status update messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_status_update_acknowledgement</code> + </td> + <td>Number of status update acknowledgement messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_unregister_framework</code> + </td> + <td>Number of framework unregistration messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/messages_unregister_slave</code> + </td> + <td>Number of slave unregistration messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/valid_framework_to_executor_messages</code> + </td> + <td>Number of valid framework to executor messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/valid_status_update_acknowledgements</code> + </td> + <td>Number of valid status update acknowledgement messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>master/valid_status_updates</code> + </td> + <td>Number of valid status update messages</td> + <td>Counter</td> +</tr> +</table> + +#### Event queue + +The following metrics provide information about different types of events in the +event queue. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>master/event_queue_dispatches</code> + </td> + <td>Number of dispatches in the event queue</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/event_queue_http_requests</code> + </td> + <td>Number of HTTP requests in the event queue</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>master/event_queue_messages</code> + </td> + <td>Number of messages in the event queue</td> + <td>Gauge</td> +</tr> +</table> + +#### Registrar + +The following metrics provide information about read and write latency to the +slave registrar. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>registrar/state_fetch_ms</code> + </td> + <td>Registry read latency in ms </td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms</code> + </td> + <td>Registry write latency in ms </td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/max</code> + </td> + <td>Maximum registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/min</code> + </td> + <td>Minimum registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/p50</code> + </td> + <td>Median registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/p90</code> + </td> + <td>90th percentile registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/p95</code> + </td> + <td>95th percentile registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/p99</code> + </td> + <td>99th percentile registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/p999</code> + </td> + <td>99.9th percentile registry write latency in ms</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>registrar/state_store_ms/p9999</code> + </td> + <td>99.99th percentile registry write latency in ms</td> + <td>Gauge</td> +</tr> +</table> + + +### Basic Alerts + +This section lists some examples of basic alerts that you can use to detect +abnormal situations in a cluster. + +#### master/uptime_secs is low + +The master has restarted. + +#### master/uptime_secs < 60 for sustained periods of time + +The cluster has a flapping master node. + +#### master/tasks_lost is increasing rapidly + +Tasks in the cluster are disappearing. Possible causes include hardware +failures, bugs in one of the frameworks, or bugs in Mesos. + +#### master/slaves_active is low + +Slaves are having trouble connecting to the master. + +#### master/cpus_percent > 0.9 for sustained periods of time + +Cluster CPU utilization is close to capacity. + +#### master/mem_percent > 0.9 for sustained periods of time + +Cluster memory utilization is close to capacity. + +#### master/elected is 0 for sustained periods of time + +No master is currently elected. + + + + +## Slave Nodes + +Metrics from each slave node are available at the following URL: + + http://<mesos-slave>:5051/metrics/snapshot + +The response is a JSON object that contains metrics names and values as key- +value pairs. + + +### Observability Metrics + +This section lists all available metrics from Mesos slave nodes grouped by +category. + +#### Resources + +The following metrics provide information about the total resources available in +the slave and their current usage. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>slave/cpus_percent</code> + </td> + <td>Percentage of allocated CPUs</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/cpus_used</code> + </td> + <td>Number of allocated CPUs</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/cpus_total</code> + </td> + <td>Number of CPUs</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/disk_percent</code> + </td> + <td>Percentage of allocated disk space</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/disk_used</code> + </td> + <td>Allocated disk space in MB</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/disk_total</code> + </td> + <td>Disk space in MB</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/mem_percent</code> + </td> + <td>Percentage of allocated memory</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/mem_used</code> + </td> + <td>Allocated memory in MB</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/mem_total</code> + </td> + <td>Memory in MB</td> + <td>Gauge</td> +</tr> +</table> + +#### Slave + +The following metrics provide information about whether a slave is currently +registered with a master and for how long it has been running. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>slave/registered</code> + </td> + <td>Whether this slave is registered with a master</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/uptime_secs</code> + </td> + <td>Uptime in seconds</td> + <td>Gauge</td> +</tr> +</table> + +#### System + +The following metrics provide information about the slave system. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>system/cpus_total</code> + </td> + <td>Number of CPUs available</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/load_15min</code> + </td> + <td>Load average for the past 15 minutes</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/load_5min</code> + </td> + <td>Load average for the past 5 minutes</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/load_1min</code> + </td> + <td>Load average for the past minute</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/mem_free_bytes</code> + </td> + <td>Free memory in bytes</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>system/mem_total_bytes</code> + </td> + <td>Total memory in bytes</td> + <td>Gauge</td> +</tr> +</table> + +#### Executors + +The following metrics provide information about the executor instances running +on the slave. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>slave/frameworks_active</code> + </td> + <td>Number of active frameworks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/executors_registering</code> + </td> + <td>Number of executors registering</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/executors_running</code> + </td> + <td>Number of executors running</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/executors_terminated</code> + </td> + <td>Number of terminated executors</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/executors_terminating</code> + </td> + <td>Number of terminating executors</td> + <td>Gauge</td> +</tr> +</table> + +#### Tasks + +The following metrics provide information about active and terminated tasks. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>slave/tasks_failed</code> + </td> + <td>Number of failed tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/tasks_finished</code> + </td> + <td>Number of finished tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/tasks_killed</code> + </td> + <td>Number of killed tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/tasks_lost</code> + </td> + <td>Number of lost tasks</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/tasks_running</code> + </td> + <td>Number of running tasks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/tasks_staging</code> + </td> + <td>Number of staging tasks</td> + <td>Gauge</td> +</tr> +<tr> + <td> + <code>slave/tasks_starting</code> + </td> + <td>Number of starting tasks</td> + <td>Gauge</td> +</tr> +</table> + +#### Messages + +The following metrics provide information about messages between the slaves and +the master it is registered with. + +<table class="table table-striped"> +<thead> +<tr><th>Metric</th><th>Description</th><th>Type</th> +</thead> +<tr> + <td> + <code>slave/invalid_framework_messages</code> + </td> + <td>Number of invalid framework messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/invalid_status_udpates</code> + </td> + <td>Number of invalid status updates</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/valid_framework_messages</code> + </td> + <td>Number of valid framework messages</td> + <td>Counter</td> +</tr> +<tr> + <td> + <code>slave/valid_status_udpates</code> + </td> + <td>Number of valid status updates</td> + <td>Counter</td> +</tr> +</table>
