Todd Lipcon created KUDU-2144:
---------------------------------

             Summary: Add metric for reactor load
                 Key: KUDU-2144
                 URL: https://issues.apache.org/jira/browse/KUDU-2144
             Project: Kudu
          Issue Type: Improvement
          Components: metrics, ops-tooling
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


Recently I was debugging a cluster that appeared to have network issues. Only 
after lots of investigation did I realize that the reactor threads were not 
keeping up with network traffic due to hitting KUDU-1964 (this cluster was 
running 1.3.0). At first glance the reactors did not seem busy, since each was 
only using ~25% of a CPU -- however, the other 75% of the time was spent 
blocked on OpenSSL locks and not in epoll_wait as one would normally expect.

This would be easier to diagnose if we had a metric showing the amount of time 
the reactors spend idle (ie in epoll_wait) vs doing work (ie executing 
callbacks, IO, etc). If any reactor is spending a high percentage of time not 
in epoll, that suggests the reactors may be a bottleneck and increasing latency 
or degrading throughput.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to