Sorry if this mail is more appropriate to the users list.  I'll carry it 
over there if that turns out to be the case.  As it is, I'm trying to use 
the vast improvement that is 2.5.0 to get some time from management to 
work on ganglia.

I've got several small (<32 nodes) clusters of Linux systems, with freebsd
or Linux file servers that are working great with gmond.

I'm running gmetad on a Linux 2.4.19 system  (RH 7.1+updates).  It sees 
the clusters just fine, and the web front end makes great graphs of them, 
etc....

I'm also using gmond to monitor our workstation network (mix of IRIX,
FreeBSD, Linux), with the same gmetad collecting the data; herein lies the
problem.  

With the old gmond (2.4.1) things mostly worked, though we often 
had IRIX machines where gmond would just silently segfault and never be 
heard from.  We also had a problem with machines (also mostly the IRIX) 
being marked as down from time to time when they (and their gmond) were 
actually fine, nevertheless, it was usable, and mostly consistent.

With 2.5.0, gmond is much more stable, and it has stopped marking live 
hosts as dead, however on the workstation network (which happens to be the 
same network as the gmetad server), the web frontend is showing graphs 
that have large gaps in them.  The values reported  for "now" always 
appear to be correct, but the values are graphed incorrectly.

For an example (not live, just a dump to html), see 
http://wwwx.atos-group.nl/admn/gmetad_ex/gmetad.html

This particular graph is for an Linux system, but it is on the same 
multi-cast channel as the IRIX systems....

So where should I begin to look?  I suspect that it is actually a problem 
with gmond, most likely on the IRIX systems, since gmetad and the web 
front end are working great on the clusters.  

regards,
-Ryan

-- 
Ryan Sweet <[EMAIL PROTECTED]>
Atos Origin Engineering Services
http://www.aoes.nl




Reply via email to