i think i know what the bug is. i can give you a fix to try and see if it works for you but it will require modifying the source. this fix will be part of the next release 3.0.1.

the problem is in process_collection_group() function near line 1600 of ./gmond/gmond.c. when there is a communication error, it can return the number zero which causes gmond to listen to data forever but not send any. the fix is to change the process_collection_group() retun to look like the following around line 1640 in ./gmond/gmond.c....

   return next < now ? now + 1 * APR_USEC_PER_SEC: next;

this will ensure that the next collection event will occur. sorry for any inconvenience this bug may have caused.

please let us know if this fixes your problem for you. my tests have shown it to be an effective workaround.

-matt



Rainer Schwierz wrote:
Hello all,

I have upgraded to
ganglia-web-3.0.0-1
ganglia-gmetad-3.0.0-1
ganglia-gmond-3.0.0-1
running Scientific Linux SL303 on all nodes.
I prefer to use unicast for gmond communication.
First all is working well, but after some hours some nodes disappear.
After one week I only see the two special nodes, which are contacted by
gmetad from the webserver. A telnet to the port on these two hosts shows
that the metric info from the other nodes disappears.
A restart of gmond on the nodes solves the problem again for some hours.
Does someone see a similar problem or any idea to solve it before I post
my detailed configuration ...

Rainer

| Rainer  Schwierz, Inst. f. Kern- und Teilchenphysik |
| TU Dresden,       D-01062 Dresden                   |
| http://iktp.tu-dresden.de/~schwierz/                |




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Ganglia-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3'

   They that can give up essential liberty to obtain a little
      temporary safety deserve neither liberty nor safety.
  --Benjamin Franklin, Historical Review of Pennsylvania, 1759

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to