Benjamin Mahler created MESOS-9235:
--------------------------------------
Summary: Add per-Process event queue counters in libprocess.
Key: MESOS-9235
URL: https://issues.apache.org/jira/browse/MESOS-9235
Project: Mesos
Issue Type: Improvement
Components: libprocess, metrics
Reporter: Benjamin Mahler
Currently, a few Processes have one-off event queue size metrics computed using
PullGauges. This approach has several known disadvantages:
* Getting event queue size metrics for a Process requires changing code /
re-compiling.
* The use of a pull gauge which dispatches onto the Process means it slows down
metrics responses, as well as counts the queue size after the queue is flushed
of all messages that arrived before the pull gauge dispatch (see MESOS-8914).
* The use of a single "size" metric means that one cannot observe the overall
enqueue and dequeue throughput.
These can be replaced by introducing first-class support in libprocess for
event queue metrics. For queue size / throughput, we can take the following
approach:
* Use configuration to opt-in to metrics for Processes of interest. E.g.
specify "master,allocator" to enable metrics for those Processes.
* Expose a pair of counters for "enqueued" and "dequeued" messages. Size of the
queue can also be calculated by the user by subtracting the two values. For
better usability, we could expose size as a pull gauge that subtracts the two
values (prone to racing) or inspects the queue size directly without a trip
through the queue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)