Github user keith-turner commented on a diff in the pull request:
--- Diff: docs/metrics.md ---
@@ -1,69 +1,112 @@
# Fluo Metrics
-Fluo core is instrumented using [dropwizard metrics]. This allows fluo
users to easily gather
-information about Fluo by configuring different reporters. While
dropwizard can be configured to
-report Fluo metrics to many different tools, below are some tools that
have been used with Fluo.
+A Fluo application can be configured (in [fluo.properties]) to report
metrics. When metrics are
+configured, Fluo will report will some 'default' metrics about an
application that helps users
+monitor its performance. Users can also write code to report 'application'
metrics from their
+applications. Both 'application' and 'default' metrics share the same
reporter configured by
+[fluo.properties] and are described in detail below.
-1. [Grafana/InfluxDB] - Fluo has [documentation] for sending
metrics to InfluxDB and viewing
- them in Grafana.
+## Configuring reporters
-2. JMX - Fluo can be configured to reports metrics via JMX which can be
viewed in jconsole or
+Fluo metrics are not published by default. To publish metrics, configure a
reporter in the 'metrics'
+section of [fluo.properties]. There are several different reporter types
(i.e Console, CSV,
+Graphite, JMX, SLF4J) that are implemented using [Dropwizard]. The choice
of which reporter to use
+depends on the visualization tool used. If you are not currently using a
visualization tool, there
+is [documentation][grafana] for reporting Fluo metrics to Grafana/InfluxDB.
-3. CSV - Fluo can be configured to output metrics as CSV to a specified
+## Metrics names
-## Configuring Reporters
+When Fluo metrics are reported, they are published using a naming scheme
that encodes additional
-In order to configure metrics reporters, look at the metrics section in an
-`fluo.properties` file. This sections has a lot of commented out options
for configuring reporters.
+Default metrics start with `fluo.class` or `fluo.system` and have
following naming schemes:
+Application metrics start with `fluo.app` and have following scheme:
+The additional information encoded in the schemes above are described
-The frequency is in seconds for all reporters.
+1. `APPLICATION` - Fluo application name
+2. `REPORTER_ID` - Unique ID of the Fluo oracle, worker, or client that is
reporting the metric.
+ When running in yarn, this ID is of the format `worker-<instance id>`
or `oracle-<instance id>`.
+ When not running from yarn, this ID consists of a hostname and a
base36 long that is unique
+ across all fluo processes.
+3. `METRIC` - Name of the metric. For 'default' metrics, this is set by
Fluo. For 'application'
+ metrics, this is set by user. Name should be unique and avoid using
period '.' in name.
+4. `CLASS` - Name of Fluo observer or loader class that produced metric.
This allows things like
+ transaction collisions to be tracked per class.
+## Application metrics
-## Metrics reported by Fluo
+Application metrics are implemented by retrieving a [MetricsReporter] from
an [Observer], [Loader],
+or [FluoClient]. These metrics are named using the format
-All metrics reported by Fluo have the prefix `fluo.<APP>.<PID>.` which is
denoted by `<prefix>` in
-the table below. In the prefix, `<APP>` represents the Fluo application
name and `<PID>` is the
-process ID of the Fluo oracle or worker that is reporting the metric. When
running in yarn, this id
-is of the format `worker-<instance id>` or `oracle-<instance id>`. When
not running from yarn, this
-id consist of a hostname and a base36 long that is unique across all fluo
+## Default metrics
-Some of the metrics reported have the class name as the suffix. This
classname is the observer or
-load task that executed the transactions. This should allow things like
transaction collisions to
-be tracked per class. In the table below this is denoted with `<cn>`.
+Default metrics report for a particular Observer/Loader class or
-|Metric | Type | Description
-|\<prefix\>.tx.lock_wait_time.\<cn\> | [Timer][T] | *WHEN:* After
each transaction. *COND:* > 0 *WHAT:* Time transaction spent waiting on
locks held by other transactions. |
-|\<prefix\>.tx.execution_time.\<cn\> | [Timer][T] | *WHEN:* After
each transaction. *WHAT:* Time transaction took to execute. Updated for failed
and successful transactions. This does not include commit time, only the time
from start until commit is called. |
-|\<prefix\>.tx.with_collision.\<cn\> | [Meter][M] | *WHEN:* After
each transaction. *WHAT:* Rate of transactions with collisions. |
-|\<prefix\>.tx.collisions.\<cn\> | [Meter][M] | *WHEN:* After
each transaction. *WHAT:* Rate of collisions. |
-|\<prefix\>.tx.entries_set.\<cn\> | [Meter][H] | *WHEN:* After
each transaction. *WHAT:* Rate of row/columns set by transaction |
-|\<prefix\>.tx.entries_read.\<cn\> | [Meter][H] | *WHEN:* After
each transaction. *WHAT:* Rate of row/columns read by transaction that existed.
There is currently no count of all reads (including non-existent data) |
-|\<prefix\>.tx.locks_timedout.\<cn\> | [Meter][M] | *WHEN:* After
each transaction. *WHAT:* Rate of timedout locks rolled back by transaction.
These are locks that are held for very long periods by another transaction that
appears to be alive based on zookeeper. |
-|\<prefix\>.tx.locks_dead.\<cn\> | [Meter][M] | *WHEN:* After
each transaction. *WHAT:* Rate of dead locks rolled by a transaction. These are
locks held by a process that appears to be dead according to zookeeper. |
-|\<prefix\>.tx.status_\<status\>.\<cn\> | [Meter][M] | *WHEN:* After
each transaction. *WHAT:* Rate of different ways a transaction can terminate |
-|\<prefix\>.oracle.response_time | [Timer][T] | *WHEN:* For
each request for stamps to the server. *WHAT:* Time RPC call to oracle took |
-|\<prefix\>.oracle.client_stamps | [Histogram][H] | *WHEN:* For
each request for stamps to the server. *WHAT:* The number of stamps requested. |
-|\<prefix\>.oracle.server_stamps | [Histogram][H] | *WHEN:* For
each request for stamps from a client. *WHAT:* The number of stamps requested. |
-|\<prefix\>.worker.notifications_queued | [Gauge][G] | *WHAT:* The
current number of notifications queued for processing. |
-|\<prefix\>.transactor.committing | [Gauge][G] | *WHAT:* The
current number of transactions that are working their way through the commit
+Below are metrics that are reported from each Observer/Loader class that
is configured in a Fluo
+application. These metrics are reported after each transaction and named
using the format
--- End diff --
Its not clear how the things below fit into the pattern. Maybe if it said
`Below are metric ids` it would make it a little more clear.
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket