joe-

gstat is a commandline tool to give you a quick look at the hosts you are 
monitoring.  run gstat from the commandline prompt in order to see it's 
output (gstat --help will give you more info).

as far as the graph is concerned.. it's hard to tell what is going on.  
the graphs are created by rrdtool from data that is stored by gmetad.  if 
you don't see errors in the syslog for the host running gmetads, than i 
suspect it might be a problem with multicast support on the network.  

we can use the gstat commandline tool as a test.. on the host running 
gmetad (and gmond to listen to the multicast traffic).. run

% gstat --dead

to list all the dead hosts.  if over time you see hosts pop in and out of 
this list then you know that multicast traffic is getting lost and ganglia 
thinks the host has died a horrible death.  

tcpdump could also help.. running

% tcpdump net 239.2.11.71

(substitute 239.2.11.71 with the whatever multicast address you are 
using.. 239.2.11.71 is the default) 

tcpdump will list all the ganglia multicast traffic in real time.. you
should see every host you are monitoring in the list of machines.

i'm sure will a little work we'll find the source of the problem.
-- 
matt

Today, Joe Griffin wrote forth saying...

> Hi All,
> 
> I have an IA32 cluster in which gstat
> shows all nodes for about 30 seconds,
> then only shows the compute node for
> about 6.5 minutes (ie a 7 minute period).
> I have looked at the system log files,
> but did not see anything.
> 
> I can "telnet NODE 8649" to a compute
> node and get information even though
> gstat can't see the data.  So I know
> that the node is broadcasting.  My
> ganglia plot is attached.
> 
> Does anyone have any thoughts as to
> what would cause this intermitant behavior?
> 
> I am using ganglia 2.5.1.
> 
> ganglia runs great on my other clusters.
> unfortunately I did not set up the
> network on this cluster, only ganglia.
> 
> TIA,
> Joe Griffin
> 


Reply via email to