prashant-

so when a node in the cluster dies the cluster size changes but the dead 
node is not reported?

this is a new problem that i haven't heard of before.  did gmond get 
restarted after the node failed?  ganglia knows the a node dies when it 
stops getting heartbeats from a machine that it previously heard from.  if 
gmond is getting restarted somehow it wouldn't know about the dead node 
because it hasn't even received a single heartbeat from it (remember that 
everything in gmond is soft state).

is it possible that your gmond data source was restarted after the node 
died?

i'm sure if we walk through this we'll find the solution to the problem.
-- 
matt

Yesterday, Prashant Bhamidipati wrote forth saying...

> Hi Steven / Matt,
> 
> I have Ganglia up and running on two farms  and everything was
> working well till 2 days back.
> 
> One of the machines on a farm was lost due to a network connection
> problem.
> 
> But ganglia still shows all nodes to be up and running ( ??? ) How can
> I rectify this problem.
> 
> For eg: If from 12 nodes, one died out, ganglia tells me that there are 11
> nodes inthe cluster and all are up and working  i.e: there are zero nodes
> down. Why does it not tell me that the 12th node is dead and that there
> are 11 nodes out of 12 working instead ?
> 
> -Prashant
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by:
> The Definitive IT and Networking Event. Be There!
> NetWorld+Interop Las Vegas 2003 -- Register today!
> http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 


Reply via email to