Benjamin Mahler created MESOS-1036:
--------------------------------------
Summary: Implement a library for exposing statistical metrics.
Key: MESOS-1036
URL: https://issues.apache.org/jira/browse/MESOS-1036
Project: Mesos
Issue Type: Improvement
Components: stats
Reporter: Benjamin Mahler
At the current time, reporting of statistical metrics is dedicated to specific
endpoints for each component, primarily the following two:
{noformat}
/master/stats.json
/slave/stats.json
{noformat}
Additional endpoints have not been added (for example, containerization
statistics, allocator statistics, libprocess statistics) due to the inherent
difficulty involved: one must either expose this data up to these higher level
endpoints, or add a new endpoint for exposing the component specific statistics.
This is why the {{Statistics}} class in libprocess was created, however it is
not being used for any statistical reporting at the current time.
[~benjaminhindman] and I had white-boarded the kinds of abstractions we wanted
to build to make statistical reporting trivial from anywhere in the code:
Create the notion of a {{Statistic}} or {{Metric}} object that can be directly
manipulated to store statistics, for example:
{code}
// In the Registrar initialization:
Metric storage_latency = statistics.create("registrar", "storage_latency");
// Recording an individual storage latency.
storage_latency.set(latency);
{code}
In addition to this, we wanted the notion of a {{Meter}}, which automatically
exposes a metered version of a statistic, for example:
{code}
Metric storage_latency = statistics.create("registrar", "storage_latency");
// Adds "storage_latency_average" which computes average over the window.
statistics.meter(storage_latency, Average());
// Adds a "storage_latency_p99", percentile is a non-trivial implementation.
statistics.meter(registrar_storage_latency, Percentile(99));
// Adds a "storage_latency_maximum"
statistics.meter(registrar_storage_latency, Maximum());
{code}
Of course, I'm not advocating a particular API in the above examples, I'm just
laying out the types of things we wanted to see available.
As we add these types of abstractions, we will want to avoid storing large time
series data in memory as is currently done in {{Statistics}}. There are a
number of things to consider with respect to the windowing technique, but I
think the notion of a window should transition from "amount of history to be
kept" to "a statistical rolling window". For example, when computing an
average, you would most likely want a rolling 1 minute average, as opposed to
the average for a 2 week window.
Efficiency of this library will be important to avoid high RSS overhead.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)