Benjamin Mahler created MESOS-1036:
--------------------------------------

             Summary: Implement a library for exposing statistical metrics.
                 Key: MESOS-1036
                 URL: https://issues.apache.org/jira/browse/MESOS-1036
             Project: Mesos
          Issue Type: Improvement
          Components: stats
            Reporter: Benjamin Mahler


At the current time, reporting of statistical metrics is dedicated to specific 
endpoints for each component, primarily the following two:

{noformat}
/master/stats.json
/slave/stats.json
{noformat}

Additional endpoints have not been added (for example, containerization 
statistics, allocator statistics, libprocess statistics) due to the inherent 
difficulty involved: one must either expose this data up to these higher level 
endpoints, or add a new endpoint for exposing the component specific statistics.

This is why the {{Statistics}} class in libprocess was created, however it is 
not being used for any statistical reporting at the current time.

[~benjaminhindman] and I had white-boarded the kinds of abstractions we wanted 
to build to make statistical reporting trivial from anywhere in the code:

Create the notion of a {{Statistic}} or {{Metric}} object that can be directly 
manipulated to store statistics, for example:

{code}
// In the Registrar initialization:
Metric storage_latency = statistics.create("registrar", "storage_latency");

// Recording an individual storage latency.
storage_latency.set(latency);
{code}

In addition to this, we wanted the notion of a {{Meter}}, which automatically 
exposes a metered version of a statistic, for example:

{code}
Metric storage_latency = statistics.create("registrar", "storage_latency");

// Adds "storage_latency_average" which computes average over the window.
statistics.meter(storage_latency, Average());

// Adds a "storage_latency_p99", percentile is a non-trivial implementation.
statistics.meter(registrar_storage_latency, Percentile(99));

// Adds a "storage_latency_maximum"
statistics.meter(registrar_storage_latency, Maximum());
{code}

Of course, I'm not advocating a particular API in the above examples, I'm just 
laying out the types of things we wanted to see available.

As we add these types of abstractions, we will want to avoid storing large time 
series data in memory as is currently done in {{Statistics}}. There are a 
number of things to consider with respect to the windowing technique, but I 
think the notion of a window should transition from "amount of history to be 
kept" to "a statistical rolling window". For example, when computing an 
average, you would most likely want a rolling 1 minute average, as opposed to 
the average for a 2 week window.

Efficiency of this library will be important to avoid high RSS overhead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to