[
https://issues.apache.org/jira/browse/MESOS-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127030#comment-16127030
]
Alexander Rukletsov commented on MESOS-7819:
--------------------------------------------
Some notes and food for thought.
Things to consider when adding metrics:
* once we add a metric, it is utterly hard to remove it (tooling and operators
rely on it)
* avoid exposing implementation details, e.g., number of HttpProxies
* usefulness of a metric, i.e., it's value can trigger an action
* performance
* security (exposing sensitive data)
Metrics to consider as part of this ticket:
* number of connections, active / idle connections, keep-alive / persistent
connections?
* number of pending messages per socket? for all sockets?
* number of active actors
For a socket, we can have an HttpProxy. For which requests we do not create it?
Is it true, that for all HTTP connections we create a proxy? In
{{ProcessManager::handle()}} we proxy both HTTP and libprocess requests. Does
it make sense to introduce a number of HTTP connections metric?
Each HTTP connection has two queues: requests to process ({{HttpProxy::items}})
and chunks of data to send ({{SocketManager::outgoing}}), though the latter is
for any socket. It probably does not make sense to expose per-socket metrics
(hard to monitor?), does it make sense to expose an aggregate across all
sockets? socket types? Will it be useful to monitor?
Do we want to have a dedicated metrics for persistent connections? Is it too
fine-grained? Is it clear to an operator, what a persistent connection in Mesos
is (i.e. the one we "linked", but when and what we link?)
> Libprocess internal state is not monitored by metrics.
> ------------------------------------------------------
>
> Key: MESOS-7819
> URL: https://issues.apache.org/jira/browse/MESOS-7819
> Project: Mesos
> Issue Type: Improvement
> Components: libprocess
> Reporter: Alexander Rukletsov
> Labels: metrics, newbie++
>
> Libprocess does not expose its internal state via metrics. Active sockets,
> number of HTTP proxies, number of running actors, number of pending messages
> for all active sockets, etc — may be of interest when monitoring and
> debugging Mesos clusters.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)