Hello all together,
I just want to report my expirience with Ganglia, which I knew as
excellent tool with a small liability to astonish me.
First: Ganglia is a great system, exactly that what I need to monitor my
machines. Thanks for programming.
We have about 170 machines aggregated in a bunch of small clusters. The
largest cluster has 20 nodes, some clusters just have 4 nodes. Actually,
I'm building up our Map24 meta grid containing seven sub grids with four
to six clusters in each grid.
At the moment I have configured three of the seven planned grids. Since
version 3.0.1 lots of problems have disappeared - thank you. Every grid
view is working perfectly. Gmetad collects all the data coming from the
clusters and displays them perfectly with the web front end.
The meta grid view shows another picture:
Grid summary: Hosts up: 66, hosts down: 8
Webserv grid: Hosts up: 16, hosts down: 3
Maptp07 grid: Hosts up: 11, Hosts down: 0
Maptp13 grid: Hosts up: 44, Hosts down: 0
The summary shows 66 running hosts (should be 71 calculated from the
information above) and 8 hosts down (should be 3). If I look to each
grid, I have the following data:
Webserv grid: Hosts up: 19, hosts down: 0 (this is ok)
Maptp07 grid: Hosts up: 11, Hosts down: 0 (this is ok)
Maptp13 grid: Hosts up: 44, Hosts down: 0 (this is ok)
Hmmm. So it should be 74 running hosts and 0 hosts down. If I click on
"Get fresh data", the meta grid shows:
Grid summary: Hosts up: 66, hosts down: 0
Webserv grid: Hosts up: 16, hosts down: 2
Maptp07 grid: Hosts up: 11, Hosts down: 7
Maptp13 grid: Hosts up: 44, Hosts down: 0
I just clicked again to "Get fresh data" and got new numbers. They are
very fresh, but unfortunately very wrong. All hosts are up, which is
perfectly shown in the each grid summary, but the meta grid summary
shows only funny stuff.
The meta grid summary was working perfectly with two sub grids (webserv,
maptp7) and started to look interesting when I connected the maptp13
cluster. The number of CPUs now ist shown like /\_/\/\_/ while it should
look like --------------.
I have four gmetad (grids: webserv, maptp7, maptp13 and the meta grid
Map24) running on one single machine working on different ports. The
load on this machine is ridiculous small, at least apache, sendmail and
gmetad are running beside the usual suspects... I'm using a dual
processor system with 1.5 GB RAM, Pentium III, 1 GHz with Red hat
Enterprise Linux AS release 3.
Any idea?
Thanks,
Stefan
--
Mapsolute GmbH
Map24 Systems and Networks
Stefan Schustereit