Might be easier to implement this on gmetad than the individual monitoring daemons...

But I just noticed this as a mild annoyance (after adding a test server to the wrong cluster :) ) - it appears the only way you'll ever really flush a server out of the reporting cluster entirely is if you restart all the daemons in the cluster.

So there are a couple of ways of dealing with this:

*  Tell gmetad to ignore data about certain hosts.
   [if ($ignored_hosts !~ /$config{'ignore_these_hosts'}/) ... ]
* Toss some user-enabled code into gmond that "ages out" nodes that don't respond for X seconds (1800, 3600, 86400, etc.). * Add some sort of protocol extension that allows you to *tell* nodes to "forget" about or otherwise "ignore" a specific node.

And have I mentioned yet this week how much I really wish the gmond metric numbers weren't in a single array? :) I'd go about reimplementing it but it would TOTALLY hose backward-compatibility... however, as things stand now gmonds in a mixed-platform environment don't understand all of each others' metrics, which is a bummer. Maybe at the same time the format's being tweaked to allow constant/volatile metrics we can think about this...


Reply via email to