[ 
https://issues.apache.org/jira/browse/MESOS-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127030#comment-16127030
 ] 

Alexander Rukletsov commented on MESOS-7819:
--------------------------------------------

Some notes and food for thought.

Things to consider when adding metrics:
* once we add a metric, it is utterly hard to remove it (tooling and operators 
rely on it)
* avoid exposing implementation details, e.g., number of HttpProxies
* usefulness of a metric, i.e., it's value can trigger an action
* performance
* security (exposing sensitive data)

Metrics to consider as part of this ticket:
* number of connections, active / idle connections, keep-alive / persistent 
connections?
* number of pending messages per socket? for all sockets?
* number of active actors

For a socket, we can have an HttpProxy. For which requests we do not create it? 
Is it true, that for all HTTP connections we create a proxy? In 
{{ProcessManager::handle()}} we proxy both HTTP and libprocess requests. Does 
it make sense to introduce a number of HTTP connections metric?

Each HTTP connection has two queues: requests to process ({{HttpProxy::items}}) 
and chunks of data to send ({{SocketManager::outgoing}}), though the latter is 
for any socket. It probably does not make sense to expose per-socket metrics 
(hard to monitor?), does it make sense to expose an aggregate across all 
sockets? socket types? Will it be useful to monitor?

Do we want to have a dedicated metrics for persistent connections? Is it too 
fine-grained? Is it clear to an operator, what a persistent connection in Mesos 
is (i.e. the one we "linked", but when and what we link?)

> Libprocess internal state is not monitored by metrics.
> ------------------------------------------------------
>
>                 Key: MESOS-7819
>                 URL: https://issues.apache.org/jira/browse/MESOS-7819
>             Project: Mesos
>          Issue Type: Improvement
>          Components: libprocess
>            Reporter: Alexander Rukletsov
>              Labels: metrics, newbie++
>
> Libprocess does not expose its internal state via metrics. Active sockets, 
> number of HTTP proxies, number of running actors, number of pending messages 
> for all active sockets, etc — may be of interest when monitoring and 
> debugging Mesos clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to