Greetings list! I was only able to get ganglia to function on my system by patching gmetad/rrd_helpers.c, changing err_sys to err_msg for this mkdir() (patch attached).
I haven't a clue why gmetad insists on repeatedly trying to create this directory on my system, but making this failure more benign seems to work okay... my system is now collecting stats. I did run this through truss, and it is showing an EEXIST in the truss output. So how the errno != EEXIST portion of this test is passing, I don't know. Am I the first one to experience this problem? Is there a better workaround? Thanks for any advice anyone can provide. On Tue, Dec 23, 2008 at 4:20 PM, Ben Lentz <[email protected]> wrote: > Greetings list! > I am having trouble keeping gmetad stable. I am running ganglia-3.1.1 > on AIX 5.3 TL6 SP4 which I have compiled from source. It was compiled > using gcc and linked with IBM's ld. The configure line was: > > ./configure --prefix=$WHERE --enable-shared=yes --enable-static=no > --with-gmetad --disable-shared > > The build seemed to work fine. I have current versions of rrdtool, > gcc, gnu make/sed, and libconfuse. I configured gmond and started it > up. > > I created a configuration file for gmetad based on the sample provided > in the source distribution. I created a > /opt/local/var/lib/ganglia/rrds/unspecified directory owned by the > 'nobody' user with mode 755. However, when I try and run gmetad in > debug level 10, it dies with the following error (after a short > delay): > > Going to run as user nobody > Sources are ... > Source: [my cluster, step 15] has 1 sources > 127.0.0.1 > xml listening on port 8651 > interactive xml listening on port 8652 > Data thread 1800 is monitoring [my cluster] data source > 127.0.0.1 > [my cluster] is a 2.5 or later data stream > hash_create size = 1024 > hash->size is 1031 > hash_create size = 50 > hash->size is 53 > hash_create size = 50 > hash->size is 53 > Updating host optaixadmin01.cswg.com, metric load_one > cleanup thread has been started > RRD_create: msync rrd_file: Invalid argument > [my cluster] is a 2.5 or later data stream > Updating host hostname.domainname.tld, metric load_one > Unable to mkdir(/opt/local/var/lib/ganglia/rrds/unspecified): Error 0 > > If I try to start it again, it dies again right away with a different > error (until I remove /opt/local/var/lib/ganglia/rrds/unspecified): > Going to run as user nobody > Sources are ... > Source: [my cluster, step 15] has 1 sources > 127.0.0.1 > xml listening on port 8651 > interactive xml listening on port 8652 > Data thread 1800 is monitoring [my cluster] data source > 127.0.0.1 > cleanup thread has been started > [my cluster] is a 2.5 or later data stream > hash_create size = 1024 > hash->size is 1031 > hash_create size = 50 > hash->size is 53 > hash_create size = 50 > hash->size is 53 > Updating host hostname.domainname.tld, metric load_one > Unable to mkdir(/opt/local/var/lib/ganglia/rrds/unspecified): Error 0 > > If I stop gmond, remove /opt/local/var/lib/ganglia/rrds/unspecified, > and start only gmetad, the process is stable, but I don't get any > local statistics. > > A few minutes after I start gmond again, gmetad dies. > > Am I using this wrong or is it currently unstable on AIX? Isn't it > possible to gmond on a gmetad server? > > Thanks in advance for any pointers you can provide. >
ganglia.patch
Description: Binary data
------------------------------------------------------------------------------
_______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

