Hi,

(I'm using UDP, multicast, version 3.1.7 on Ubuntu 9 server edition).

I'm having a problem with missing heartbeats that go silent for random periods of time. I've done an apt-get install on a large network of virtual machines. I have monitors inside some 40+ virtual machines and another monitor on a physical machine that runs a webserver + gmetad.

Behavior: Inside the web interface everything looks great for the first few minutes. Then after a few minutes, gstat and the web interface start reporting all the VMs are down. Soon after that, about 10 or 20 minutes later: all the VM heartbeats "magically" come back to life out of nowhere.

These "silent periods" cycle continuously - it never stops - Up, then down, up then down.

I've tried several things to debug the problem:

  1. gmond -d 3 on the VMs report heartbeats continuously in the
     "downtimes" mentioned above, yet they do not seem to arrive at the
     main server's gmond monitor. (I'm grepping the "REPORTED" time
     inside the xml on the gmond port to verify)
  2. Restarting the master gmond does not always solve the problem:
     heartbeats may go silent for as long as 10 or 20 minutes and then
     suddenly show up again as new mertrics inside the XML grabbed from
     the master gmond monitor and inside the web interface.
  3. I'm definitely not having a connectivity problem - I can clearly
     see from tcpdump that multicast packets are making it to both
     sides. And the fact that the heartbeats *do* eventually "come
     back" after 10 or 20 minutes also confirms that.

The behavior is very peculiar.

Anybody experienced this behavior? Am I using a bad version?

Thanks for your help,

- Michael R. Hines
------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to