Hi all,

while this proposal/idea is a very small change code-wise, but it would be
employing libprocess HTTP routing logic in an afaik unprecedented way, so I
wanted to open this up for discussion.

# Motivation

Currently, the only way to access libprocess metrics is via the
`metrics/snapshot` endpoint, which returns the current values of all
installed metrics.

If the caller is only interested in a specific metric, or a subset of the
metrics, this is wasteful in two ways: First the process has to do extra
work to collect these metrics, and second the caller has to do extra work
to filter out the unneeded metrics.

# Proposal
I'm proposing to have the `/metrics/` endpoint being able to be followed by
an arbitrary path. The returned returned JSON object will contain only
those metrics whose key begins with the specified path:

    `/metrics` -> Return all metrics
    `/metrics/master/messages` -> Return all metrics beginning with
`master/messages`, e.g. `master/messages_launch_tasks`, etc.

A proof of concept implementation can be found here:
https://reviews.apache.org/r/70211

# Discussion
The current naming conventions for metrics, i.e. `master/tasks_killed`,
suggests to the casual observer that metrics are stored and accessible in a
hierarchical manner. Using a prefix filter allows users to filter certain
parts of the metrics as if they were indeed hierarchical, while still
allowing libprocess to use a flat namespace for all metric names internally.

The method of access, using the url path directly instead of a query
parameter, is unusual but it has the advantage that, in my obervations, it
matches what people intuitively try to do anyways when they want to access
a subset of metrics.

One other drawback is that all other routes of the MetricsProcess will
shadow the corresponding filter value, e.g. in right now it would not be
possible to return all metrics whose names begin with 'snapshot/'.

# Alternatives
1) Add a `prefix` parameter to the `snapshot` endpoint, i.e.

    `/metrics/snapshot?prefix=/master/cpu`

This is more in line with how we classically do libprocess endpoints, but
from a UI perspective it's hard to discover: Many people, including some
Mesos developers, already have trouble remembering to append `/snapshot` to
get the metrics, so requiring to memorize an additional parameter does not
seem nice.

2) Move the dynamic prefix under some other endpoint `/values`, i.e.

    /metrics/values/master/messages`

This has the main disadvantage that /values (with empty filter) and
/snapshot will return exactly the same data, begging the question why both
are needed.


What do you think? I'm looking forward to hear your thoughts, ideas, etc.

Best regards,
-- 
Benno Evers
Software Engineer, Mesosphere

Reply via email to