[
https://issues.apache.org/jira/browse/KUDU-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated KUDU-2144:
------------------------------
Status: In Review (was: Open)
> Add metric for reactor load
> ---------------------------
>
> Key: KUDU-2144
> URL: https://issues.apache.org/jira/browse/KUDU-2144
> Project: Kudu
> Issue Type: Improvement
> Components: metrics, ops-tooling
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> Recently I was debugging a cluster that appeared to have network issues. Only
> after lots of investigation did I realize that the reactor threads were not
> keeping up with network traffic due to hitting KUDU-1964 (this cluster was
> running 1.3.0). At first glance the reactors did not seem busy, since each
> was only using ~25% of a CPU -- however, the other 75% of the time was spent
> blocked on OpenSSL locks and not in epoll_wait as one would normally expect.
> This would be easier to diagnose if we had a metric showing the amount of
> time the reactors spend idle (ie in epoll_wait) vs doing work (ie executing
> callbacks, IO, etc). If any reactor is spending a high percentage of time not
> in epoll, that suggests the reactors may be a bottleneck and increasing
> latency or degrading throughput.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)