Might be easier to implement this on gmetad than the individual monitoring
daemons...
But I just noticed this as a mild annoyance (after adding a test server to
the wrong cluster :) ) - it appears the only way you'll ever really flush a
server out of the reporting cluster entirely is if you restart all the
daemons in the cluster.
So there are a couple of ways of dealing with this:
* Tell gmetad to ignore data about certain hosts.
[if ($ignored_hosts !~ /$config{'ignore_these_hosts'}/) ... ]
* Toss some user-enabled code into gmond that "ages out" nodes that don't
respond for X seconds (1800, 3600, 86400, etc.).
* Add some sort of protocol extension that allows you to *tell* nodes to
"forget" about or otherwise "ignore" a specific node.
And have I mentioned yet this week how much I really wish the gmond metric
numbers weren't in a single array? :) I'd go about reimplementing it but
it would TOTALLY hose backward-compatibility... however, as things stand
now gmonds in a mixed-platform environment don't understand all of each
others' metrics, which is a bummer. Maybe at the same time the format's
being tweaked to allow constant/volatile metrics we can think about this...