I downloaded the latest version of Ganglia and compiled and installed on my Hadoop cluster. Configured according to the documented procedures. The latest stable version of Ganglia is 3.2, and I am using hadoop-0.20.2-cdh31
I just copied the gmond.conf from the distribution to the nodes. It has what look like default values 239.2.11.71 for mcast_join and port 8649 throughout. The core (non hadoop) Ganglia reporting works fine, but Ganglia is not communicating with Hadoop in any reproducible way. I got reporting on one node once, got a *different* node reported from telnet localhost 8649 once, but more generally get no reporting of hadoop metrics at all! When I bounce the cluster and/or gmond I may or may not get any difference in behavior. It is frustrating because the behavior seems to be random and not reproducible. I wonder if there is a problem with version compatibility? If there were release notes indicating a compatibility issue I didn't see them on the ganglia site. At this point, I'm tempted to give up on Ganglia for hadoop metrics and look for alternatives. Any ideas?
