Hello everyone, I'm trying to figure out some very puzzling behaviour I'm seeing from Ganglia with a ROCKS 4.3 cluster.
Specifically ... if I use the default multicast address for sending and receiving (237.105.185.156), I get about five minutes worth of effective monitoring and then my client nodes start appearing as dead (telnet'ing to the Ganglia service tells me their TN time never gets updated), even though they are very much alive. However ... if I switch to the all-systems.mcast.net address (224.0.0.1), no such behaviour occurs, and monitoring works perfectly. That's a bit of a snag, though, since the gmond.conf files are managed by the ROCKS DB, and editing them by hand is both time-consuming and makes my cluster configuration non-portable. I've tried adding explicit multicast routes for the Ganglia address, but that doesn't make a difference; if I run tcpdump on the head node, I can see multicast traffic coming in to 237.105.185.156 from the head node itself on the proper interface, and watching the client nodes for the same information tells me they are sending on the right interface ... but nothing ever shows up. If I ping 237.105.185.156 from a shell on any node in the cluster, I get a response only from the head node; however, if I ping 224.0.0.1, I get (as expected) a response from all my nodes. I've been scratching my head over this for almost a week now; I've combed the mailing list archives for both Ganglia and ROCKS in hopes of gaining some insights, but nobody appears to have encountered a problem like this one before. I don't think it's actually a Ganglia issue at its heart, but if I can figure out why it's manifesting these symptoms, I might be able to fix the problem overall. Any help/suggestions/advice would be very much appreciated -- it's very frustrating! thanks, Klaus ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

