Sorry if this mail is more appropriate to the users list. I'll carry it over there if that turns out to be the case. As it is, I'm trying to use the vast improvement that is 2.5.0 to get some time from management to work on ganglia.
I've got several small (<32 nodes) clusters of Linux systems, with freebsd or Linux file servers that are working great with gmond. I'm running gmetad on a Linux 2.4.19 system (RH 7.1+updates). It sees the clusters just fine, and the web front end makes great graphs of them, etc.... I'm also using gmond to monitor our workstation network (mix of IRIX, FreeBSD, Linux), with the same gmetad collecting the data; herein lies the problem. With the old gmond (2.4.1) things mostly worked, though we often had IRIX machines where gmond would just silently segfault and never be heard from. We also had a problem with machines (also mostly the IRIX) being marked as down from time to time when they (and their gmond) were actually fine, nevertheless, it was usable, and mostly consistent. With 2.5.0, gmond is much more stable, and it has stopped marking live hosts as dead, however on the workstation network (which happens to be the same network as the gmetad server), the web frontend is showing graphs that have large gaps in them. The values reported for "now" always appear to be correct, but the values are graphed incorrectly. For an example (not live, just a dump to html), see http://wwwx.atos-group.nl/admn/gmetad_ex/gmetad.html This particular graph is for an Linux system, but it is on the same multi-cast channel as the IRIX systems.... So where should I begin to look? I suspect that it is actually a problem with gmond, most likely on the IRIX systems, since gmetad and the web front end are working great on the clusters. regards, -Ryan -- Ryan Sweet <[EMAIL PROTECTED]> Atos Origin Engineering Services http://www.aoes.nl
