matt massie wrote:
prashant-
so when a node in the cluster dies the cluster size changes but the dead
node is not reported?
this is a new problem that i haven't heard of before. did gmond get
restarted after the node failed? ganglia knows the a node dies when it
stops getting heartbeats from a machine that it previously heard from. if
gmond is getting restarted somehow it wouldn't know about the dead node
because it hasn't even received a single heartbeat from it (remember that
everything in gmond is soft state).
is it possible that your gmond data source was restarted after the node
died?
i'm sure if we walk through this we'll find the solution to the problem.
Now that I think about it, I seem to recall this happening to me in one of
the recent (but not current) 2.5.x frontend revisions. There was a bug in
(I believe) ganglia.php which was not incrementing the dead node array.
I'm pretty sure the reason I didn't respond to the original message was
that he's using the most current version and still gets the same behavior,
so I was stumped. But I just had that idea again and decided to throw it
out there in the hope of it being useful...
And I know none of the regular readers of this list believe me, but I
really *do* try not to go shooting off my mouth when I have no idea how to
fix the problem... :)