[ 
https://issues.apache.org/jira/browse/KUDU-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-2144:
------------------------------
    Status: In Review  (was: Open)

> Add metric for reactor load
> ---------------------------
>
>                 Key: KUDU-2144
>                 URL: https://issues.apache.org/jira/browse/KUDU-2144
>             Project: Kudu
>          Issue Type: Improvement
>          Components: metrics, ops-tooling
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Recently I was debugging a cluster that appeared to have network issues. Only 
> after lots of investigation did I realize that the reactor threads were not 
> keeping up with network traffic due to hitting KUDU-1964 (this cluster was 
> running 1.3.0). At first glance the reactors did not seem busy, since each 
> was only using ~25% of a CPU -- however, the other 75% of the time was spent 
> blocked on OpenSSL locks and not in epoll_wait as one would normally expect.
> This would be easier to diagnose if we had a metric showing the amount of 
> time the reactors spend idle (ie in epoll_wait) vs doing work (ie executing 
> callbacks, IO, etc). If any reactor is spending a high percentage of time not 
> in epoll, that suggests the reactors may be a bottleneck and increasing 
> latency or degrading throughput.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to