I would like to start a discussion of how to enhance Ganglia to address
issues like removing a host from the grid and reorganizing one cluster
into two clusters.
Today, I was installing ganglia on an 8-node cluster. Apparently, this
cluster could communicate via multicast with a 4-node cluster I had
previously installed on via the default multicast address. So, in my
web front end, I ended up with two 12-node clusters as well as various
rrd files/directories where I didn't really want them. As near as I can
tell, in order to fix this, I have to shut down about 13 instances of
gmond and/or gmetad -- fix up the rrd file directory and the
/etc/gmond.conf configuration files -- and then start up 13 instances of
gmond and/or gmetad.
Do we need a command to have clusters purge their internal "soft
state"? Can we get clusters to automatically detect that their cache is
out of date and fix their cache? What are the appropriate approaches to
attack this sort of issue?
Thanks, Chuck