> i'm running Ganglia 3.1.2 on Solaris 10.  i have 10 machines in a
> cluster, with one machine's gmond configured for gmetad to talk to.
> everything works fine, except that after a few hours, gmond will get
> stuck, and never update its XML data:
>
> <CLUSTER NAME="Amsterdam" LOCALTIME="1256330745" OWNER="Toolserver 
> Administrators" LATLONG="unspecified" URL="unspecified">
> <HOST NAME="hemlock.toolserver.org" IP="91.198.174.194" REPORTED="1256309168" 
> TN="21576" TMAX="20" DMAX="0" LOCATION="esams" GMOND_STARTED="1256283448">
>
> notice that 'reported' is a long time behind 'localtime'; this happens
> for all machines in the XML output.  however, this is the <HOST> record
> for the machine gmond is running on, so it can't be a network issue, and
> gmond is clearly running.  if i restart gmond on this machine, it starts
> collecting data for all machines again.

We have exactly the same problem on our Solaris 10 machines. I mentioned 
it to this list several months ago and was advised to submit a bug report, 
which now I have some free time I am about to do!

We have gmond set to bounce every hour in cron which is a helpful work 
around.

What we see is the gmond udp receive channel dying after a random period 
of time between 10 mins - one hour. We've tried compiling apr with the 
--disable-nonportable-atomics just in that made a difference but the 
result is the same.

Hmmm perhaps I should file that bug today!

Paul


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to