Hi All,

I'm setting up Ganglia 2.5.7 on an Apple Xserve cluster. I have gmond on 27 cluster nodes and gmetad and the web frontend on a machine acting as the head node. These are all on the same network/subnet and the head node is currently not multihomed (although it will be). I am using Mac OS X Server 10.3.6 (darwin 7.6.0) on the head and compute nodes, I also have rrdtool 1.0.49 installed on the head node.

I compiled gmond with the vanilla options.
./configure
make
make install

and gmetad with the following options (I did substitute the rrd header paths with my own). ./configure CFLAGS="-I/rrd/header/path" CPPFLAGS="-I/rrd/header/path" LDFLAGS="-L/rrd/library/path" --with-gmetad
make
make install

which produced no error messages during installation (I can provide the configure/make/make install output if required).

When I run jobs on the cluster the load 1/5/15 metrics show work happening although none of the CPU (user/system/nice etc) seem to show anything. For an example of what I mean download my two screenshots at
http://www.maccs.mq.edu.au/~crichard/ganglia/

The job I'm running in these screenshots is viewable in top as "Xgrid" and is using between 30->50% of this nodes CPU, it is also a user process. The load metrics graphed seem to represent this however you can see that the CPU graphs are all empty. You'll also notice the "Network" graph in the first pdf is non existent. Might this be a difference between the way linux labels network devices (eth0 as opposed to en0 in OSX).

I've read through all the documentation that comes with the ganglia-core tarball and searched through the gmond.conf and gmetad.conf files for something that might explain this but I'm not getting anywhere.

Is there anyone out there who has successfully setup Ganglia on a Max OS X cluster who can tell me what options they used to configure/compile Ganglia and any tricks/tips they have used to get it to work successfully. Alternatively are there some parts of Ganglia that are known to not work on Mac OS X. For instance if you have a look in the ./gmond/machines/darwin.c file in the source distribution there are a lot of functions that look like the following.

g_val_t
cpu_system_func ( void )
{
   g_val_t val;
   val.f = 0.0;
   return val;
}

To me this just looks like these metrics are not implemented yet? Perhaps one of the developers can confirm this?

thanks,
Craig


Reply via email to