I was able to remove the "dead" host (that isn't really dead) from the overview display.
I had to kill all gmond's everywhere, and the gmetad.
Then I removed the rrd files for the "dead" host from gmetad's rrds directory,
and the rrd directory itself.
Then I removed the "dead" host's IP address from gmetad.conf.
Then I brought up all the gmonds (except the "dead" one) and then the gmetad.
Apparently, these steps will have to be added to our failover procedure.


Martin Knoblauch wrote:
...

 Also, just to better understand the situation, what is the exact setup? Is one of the "gmond"s designated as a collector? Or do all "gmond"s carry all metrics from all hosts? Which "gmond" is queried by "gmetad" (snippet from config file)? You should telnet/nc to that "gmond" and check whether it has current metrics from "B".

  
I don't know what "designated as a collector" means.
Nor do I know how to control which gmonds carry all metrics from which hosts.  There is only one udp_send_channel
in gmond.conf, and the host in there is the one running gmetad.
My /etc/ganglia/gmetad.conf file has only one line in it.  data_source "clustername" followed by a
list of IP addresses of all the gmond hosts.
(My original understanding was the gmetad queries each gmond, or the gmonds all report to the gmetad.
So I just listed all the IP addresses there.  But now it seems the flow is more complex than that.)
I don't have a manpage for gmetad.conf, so I just guessed what to put in there from the sample file.

-Cameron




This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to