We have a distributed configuration with a single gmetad collecting
data (via pull) from 9 remote gmond's which in turn are fed (via
push) by local gmond's. So far so good, but on one cluster,
consistently stops responding to TCP requests. The result is that
there is no updates at the gmetad site.
What I have found by debugging the gmond process is:
1) There is an open TCP connection from the gmetad server to the
gmond server. It does not seem to close.
2) The gmond server accepts UDP messages and processes them. I've
watched it in the debugger.
3) When I telnet to the gmond port, it connects, but there is no
response. In fact, the gmond server does not seem to call accept on
this port.
4) Both gmetad and gmond are version 3.0.2
Any help would be appreciated.
Thanks,
Elliot