Hi All,
I'm setting up Ganglia 2.5.7 on an Apple Xserve cluster. I have gmond
on 27 cluster nodes and gmetad and the web frontend on a machine acting
as the head node. These are all on the same network/subnet and the
head node is currently not multihomed (although it will be). I am
using Mac OS X Server 10.3.6 (darwin 7.6.0) on the head and compute
nodes, I also have rrdtool 1.0.49 installed on the head node.
I compiled gmond with the vanilla options.
./configure
make
make install
and gmetad with the following options (I did substitute the rrd header
paths with my own).
./configure CFLAGS="-I/rrd/header/path" CPPFLAGS="-I/rrd/header/path"
LDFLAGS="-L/rrd/library/path" --with-gmetad
make
make install
which produced no error messages during installation (I can provide the
configure/make/make install output if required).
When I run jobs on the cluster the load 1/5/15 metrics show work
happening although none of the CPU (user/system/nice etc) seem to show
anything. For an example of what I mean download my two screenshots at
http://www.maccs.mq.edu.au/~crichard/ganglia/
The job I'm running in these screenshots is viewable in top as "Xgrid"
and is using between 30->50% of this nodes CPU, it is also a user
process. The load metrics graphed seem to represent this however you
can see that the CPU graphs are all empty. You'll also notice the
"Network" graph in the first pdf is non existent. Might this be a
difference between the way linux labels network devices (eth0 as
opposed to en0 in OSX).
I've read through all the documentation that comes with the
ganglia-core tarball and searched through the gmond.conf and
gmetad.conf files for something that might explain this but I'm not
getting anywhere.
Is there anyone out there who has successfully setup Ganglia on a Max
OS X cluster who can tell me what options they used to
configure/compile Ganglia and any tricks/tips they have used to get it
to work successfully. Alternatively are there some parts of Ganglia
that are known to not work on Mac OS X. For instance if you have a
look in the ./gmond/machines/darwin.c file in the source distribution
there are a lot of functions that look like the following.
g_val_t
cpu_system_func ( void )
{
g_val_t val;
val.f = 0.0;
return val;
}
To me this just looks like these metrics are not implemented yet?
Perhaps one of the developers can confirm this?
thanks,
Craig