> i'm running Ganglia 3.1.2 on Solaris 10. i have 10 machines in a > cluster, with one machine's gmond configured for gmetad to talk to. > everything works fine, except that after a few hours, gmond will get > stuck, and never update its XML data: > > <CLUSTER NAME="Amsterdam" LOCALTIME="1256330745" OWNER="Toolserver > Administrators" LATLONG="unspecified" URL="unspecified"> > <HOST NAME="hemlock.toolserver.org" IP="91.198.174.194" REPORTED="1256309168" > TN="21576" TMAX="20" DMAX="0" LOCATION="esams" GMOND_STARTED="1256283448"> > > notice that 'reported' is a long time behind 'localtime'; this happens > for all machines in the XML output. however, this is the <HOST> record > for the machine gmond is running on, so it can't be a network issue, and > gmond is clearly running. if i restart gmond on this machine, it starts > collecting data for all machines again.
We have exactly the same problem on our Solaris 10 machines. I mentioned it to this list several months ago and was advised to submit a bug report, which now I have some free time I am about to do! We have gmond set to bounce every hour in cron which is a helpful work around. What we see is the gmond udp receive channel dying after a random period of time between 10 mins - one hour. We've tried compiling apr with the --disable-nonportable-atomics just in that made a difference but the result is the same. Hmmm perhaps I should file that bug today! Paul ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

