Github user keith-turner commented on a diff in the pull request: https://github.com/apache/incubator-fluo-website/pull/23#discussion_r83222182 --- Diff: docs/fluo/1.0.0-incubating/metrics.md --- @@ -0,0 +1,117 @@ +--- +layout: fluo-doc +title: Fluo Metrics +version: 1.0.0-incubating +--- + +A Fluo application can be configured (in [fluo.properties]) to report metrics. When metrics are +configured, Fluo will report some 'default' metrics about an application that help users monitor its +performance. Users can also write code to report 'application-specific' metrics from their +applications. Both 'application-specific' and 'default' metrics share the same reporter configured +by [fluo.properties] and are described in detail below. + +## Configuring reporters + +Fluo metrics are not published by default. To publish metrics, configure a reporter in the 'metrics' +section of [fluo.properties]. There are several different reporter types (i.e Console, CSV, +Graphite, JMX, SLF4J) that are implemented using [Dropwizard]. The choice of which reporter to use +depends on the visualization tool used. If you are not currently using a visualization tool, there +is [documentation][grafana] for reporting Fluo metrics to Grafana/InfluxDB. + +## Metrics names + +When Fluo metrics are reported, they are published using a naming scheme that encodes additional +information. This additional information is represented using all caps variables (i.e `METRIC`) +below. + +Default metrics start with `fluo.class` or `fluo.system` and have following naming schemes: + + fluo.class.APPLICATION.REPORTER_ID.METRIC.CLASS + fluo.system.APPLICATION.REPORTER_ID.METRIC + +Application metrics start with `fluo.app` and have following scheme: + + fluo.app.REPORTER_ID.METRIC + +The variables below describe the additional information that is encoded in metrics names. + +1. `APPLICATION` - Fluo application name +2. `REPORTER_ID` - Unique ID of the Fluo oracle, worker, or client that is reporting the metric. + When running in YARN, this ID is of the format `worker-INSTANCE_ID` or `oracle-INSTANCE_ID` + where `INSTANCE_ID` corresponds to instance number. When not running in YARN, this ID consists + of a hostname and a base36 long that is unique across all fluo processes. +3. `METRIC` - Name of the metric. For 'default' metrics, this is set by Fluo. For 'application' + metrics, this is set by user. Name should be unique and avoid using period '.' in name. +4. `CLASS` - Name of Fluo observer or loader class that produced metric. This allows things like + transaction collisions to be tracked per class. + +## Application-specific metrics + +Application metrics are implemented by retrieving a [MetricsReporter] from an [Observer], [Loader], +or [FluoClient]. These metrics are named using the format `fluo.app.REPORTER_ID.METRIC`. + +## Default metrics + +Default metrics report for a particular Observer/Loader class or system-wide. + +Below are metrics that are reported from each Observer/Loader class that is configured in a Fluo +application. These metrics are reported after each transaction and named using the format +`fluo.class.APPLICATION.REPORTER_ID.METRIC.CLASS`. + +* tx_lock_wait_time - [Timer] + - Time transaction spent waiting on locks held by other transactions. + - Only updated for transactions that have non-zero lock time. +* tx_execution_time - [Timer] + - Time transaction took to execute. + - Updated for failed and successful transactions. + - This does not include commit time, only the time from start until commit is called. +* tx_with_collision - [Meter] + - Rate of transactions with collisions. +* tx_collisions - [Meter] + - Rate of collisions. +* tx_entries_set - [Meter] + - Rate of row/columns set by transaction +* tx_entries_read - [Meter] + - Rate of row/columns read by transaction that existed. + - There is currently no count of all reads (including non-existent data) +* tx_locks_timedout - [Meter] + - Rate of timedout locks rolled back by transaction. + - These are locks that are held for very long periods by another transaction that appears to be + alive based on zookeeper. +* tx_locks_dead - [Meter] + - Rate of dead locks rolled by a transaction. + - These are locks held by a process that appears to be dead according to zookeeper. +* tx_status_`<STATUS>` - [Meter] + - Rate of different ways (i.e `<STATUS>`) a transaction can terminate + +Below are system-wide metrics that are reported for the entire Fluo application. These metrics are +named using the format `fluo.system.APPLICATION.REPORTER_ID.METRIC`. + +* oracle_response_time - [Timer] + - Time each RPC call to oracle for stamps took +* oracle_client_stamps - [Histogram] + - Number of stamps requested for each request for stamps to the server +* oracle_server_stamps - [Histogram] + - Number of stamps requested for each request for stamps from a client +* worker_notifications_queued - [Gauge] + - The current number of notifications queued for processing. +* transactor_committing - [Gauge] + - The current number of transactions that are working their way through the commit steps. + +Histograms and Timers have a counter. In the case of a histogram, the counter is the number of times +the metric was updated and not a sum of the updates. For example if a request for 5 timestamps was +made to the oracle followed by a request for 3 timestamps, then the count for `oracle_server_stamps` +would be 2 and the mean would be (5+3)/2. + +[fluo.properties]: https://github.com/apache/fluo/blob/1.0.0-incubating/modules/distribution/src/main/config/fluo.properties --- End diff -- This link does not work. I got the following link to work. https://github.com/apache/incubator-fluo/blob/rel/fluo-1.0.0-incubating/modules/distribution/src/main/config/fluo.properties
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---