I have 3.0.4 installed and mostly working properly.  We have lots of 
hosts, so the rrds are on tmpfs.  I have a grid-of-grids configuration, 
with separate gmetad's running for each grid, all on the same host.  The 
server in question is pretty beefy; it's a Sun T2000; typically has a 
load average around one.

One of the grids (GRID-A) has about 700 hosts and is comprised of 2 
clusters (525 & 175 hosts). 

When you visit the page for GRID-A, you will sometimes see accurate 
counts of hosts up and cpus total, and sometimes some of those numbers 
will be low.  It seems as though the web server is handing out data that 
comes from the middle of a calculation cycle.  Sometimes the cpus total 
number for one of the clusters will be quite low; occasionally you'll 
see zeros for all three (cpus total, hosts up, hosts down) for the grid 
summary.  Sometimes the grid summary number is not the same as the sum 
of the numbers from the two clusters.   The graphs shown on this page 
seem to always be correct.  The red and green lines showing CPUs and 
Nodes are perfectly flat.

If you visit the parent of GRID-A (the top level grid that rolls up all 
the other grids),  you will  sometimes see low numbers for GRID-A, and 
sometimes the overall total numbers will be wrong.  On the toplevel grid 
page, the read and green lines for GRID-A and for the summary are not 
flat, but show "dips" of varying severity.

This problem was worse before I moved the rrds to tmpfs.  It has 
improved markedly, but not gone away completely.  The other grids are 
smaller (largest is 400) and do not show these symptoms.

ideas?




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to