We have a distributed configuration with a single gmetad collecting data (via pull) from 9 remote gmond's which in turn are fed (via push) by local gmond's. So far so good, but on one cluster, consistently stops responding to TCP requests. The result is that there is no updates at the gmetad site.

What I have found by debugging the gmond process is:

1) There is an open TCP connection from the gmetad server to the gmond server. It does not seem to close. 2) The gmond server accepts UDP messages and processes them. I've watched it in the debugger. 3) When I telnet to the gmond port, it connects, but there is no response. In fact, the gmond server does not seem to call accept on this port.
4) Both gmetad and gmond are version 3.0.2

Any help would be appreciated.
Thanks,
Elliot


Reply via email to