Hello everyone,

I'm trying to figure out some very puzzling behaviour I'm seeing from Ganglia 
with a ROCKS 4.3 cluster.

Specifically ... if I use the default multicast address for sending and 
receiving (237.105.185.156), I get about five minutes worth of effective 
monitoring and then my client nodes start appearing as dead (telnet'ing to the 
Ganglia service tells me their TN time never gets updated), even though they 
are very much alive.

However ... if I switch to the all-systems.mcast.net address (224.0.0.1), no 
such behaviour occurs, and monitoring works perfectly. That's a bit of a snag, 
though, since the gmond.conf files are managed by the ROCKS DB, and editing 
them by hand is both time-consuming and makes my cluster configuration 
non-portable.


I've tried adding explicit multicast routes for the Ganglia address, but that 
doesn't make a difference; if I run tcpdump on the head node, I can see 
multicast traffic coming in to 237.105.185.156 from the head node itself on the 
proper interface, and watching the client nodes for the same information tells 
me they are sending on the right interface ... but nothing ever shows up.

If I ping 237.105.185.156 from a shell on any node in the cluster, I get a 
response only from the head node; however, if I ping 224.0.0.1, I get (as 
expected) a response from all my nodes.

I've been scratching my head over this for almost a week now; I've combed the 
mailing list archives for both Ganglia and ROCKS in hopes of gaining some 
insights, but nobody appears to have encountered a problem like this one 
before. I don't think it's actually a Ganglia issue at its heart, but if I can 
figure out why it's manifesting these symptoms, I might be able to fix the 
problem overall.

Any help/suggestions/advice would be very much appreciated -- it's very 
frustrating!

thanks,
Klaus

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to