Benjamin Mahler created MESOS-9235:
--------------------------------------

             Summary: Add per-Process event queue counters in libprocess.
                 Key: MESOS-9235
                 URL: https://issues.apache.org/jira/browse/MESOS-9235
             Project: Mesos
          Issue Type: Improvement
          Components: libprocess, metrics
            Reporter: Benjamin Mahler


Currently, a few Processes have one-off event queue size metrics computed using 
PullGauges. This approach has several known disadvantages:

* Getting event queue size metrics for a Process requires changing code / 
re-compiling.
* The use of a pull gauge which dispatches onto the Process means it slows down 
metrics responses, as well as counts the queue size after the queue is flushed 
of all messages that arrived before the pull gauge dispatch (see MESOS-8914).
* The use of a single "size" metric means that one cannot observe the overall 
enqueue and dequeue throughput.

These can be replaced by introducing first-class support in libprocess for 
event queue metrics. For queue size / throughput, we can take the following 
approach:

* Use configuration to opt-in to metrics for Processes of interest. E.g. 
specify "master,allocator" to enable metrics for those Processes.
* Expose a pair of counters for "enqueued" and "dequeued" messages. Size of the 
queue can also be calculated by the user by subtracting the two values. For 
better usability, we could expose size as a pull gauge that subtracts the two 
values (prone to racing) or inspects the queue size directly without a trip 
through the queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to