matt massie wrote:
prashant-

so when a node in the cluster dies the cluster size changes but the dead node is not reported?

this is a new problem that i haven't heard of before. did gmond get restarted after the node failed? ganglia knows the a node dies when it stops getting heartbeats from a machine that it previously heard from. if gmond is getting restarted somehow it wouldn't know about the dead node because it hasn't even received a single heartbeat from it (remember that everything in gmond is soft state).

is it possible that your gmond data source was restarted after the node died?

i'm sure if we walk through this we'll find the solution to the problem.

Now that I think about it, I seem to recall this happening to me in one of the recent (but not current) 2.5.x frontend revisions. There was a bug in (I believe) ganglia.php which was not incrementing the dead node array.

I'm pretty sure the reason I didn't respond to the original message was that he's using the most current version and still gets the same behavior, so I was stumped. But I just had that idea again and decided to throw it out there in the hope of it being useful...

And I know none of the regular readers of this list believe me, but I really *do* try not to go shooting off my mouth when I have no idea how to fix the problem... :)


Reply via email to