I noticed logs filling with "Error 1 sending the modular data"
Google reveals this has been discussed several times in the past, and none of the discussions ended with a solution, so I'm presenting some analysis below. Here is what I did and what I found: I discovered my gmond PID = 21015 and I checked it with strace: strace -p 21015 -o /tmp/gmond.errs -v After about a minute, I had a look inside /tmp/gmond.errs, lots of this: write(7, "\0\0\0\205\0\0\0\4srv1\0\0\0\fmachine_type\0\0\0\0"..., 52) = 52 write(8, "\0\0\0\205\0\0\0\4srv1\0\0\0\fmachine_type\0\0\0\0"..., 52) = -1 EINVAL (Invalid argument) write(7, "\0\0\0\200\0\0\0\4srv1\0\0\0\7os_name\0\0\0\0\0\0\0\0\6"..., 164) = 164 write(8, "\0\0\0\200\0\0\0\4srv1\0\0\0\7os_name\0\0\0\0\0\0\0\0\6"..., 164) = -1 EINVAL (Invalid argument) time([1351418592]) = 1351418592 sendto(9, "<30>Oct 28 11:03:12 /usr/sbin/gm"..., 90, MSG_NOSIGNAL, NULL, 0) = 90 Notice the `sendto' is actually sending the error to syslog, not sending a metric packet Ok, the `write' calls show me two file descriptors, 7 and 8. writes to FD 8 are failing with EINVAL: write(8, .... ) = -1 EINVAL (Invalid argument) The file descriptors correspond to two different udp_send_channels in gmond.conf - but which is which? Fortunately, lsof tells me: lsof -p 21015 -n gmond 21015 ganglia 7u IPv4 2747622 0t0 UDP 192.168.1.2:44778->239.2.11.71:8649 gmond 21015 ganglia 8u IPv4 2747628 0t0 UDP (VPN address):53976->(remote server address):8649 Notice that FD 7 corresponds to a very standard multicast channel, while FD 8 corresponds to a UDP unicast channel. I have deleted the IP addresses, but this immediately revealed the problem (in my case anyway): the local address (VPN address) existed when gmond started, but no longer exists on this machine (because the VPN is not always up). I can imagine similar problems would occur for hosts that get an IP by means of DHCP, or hosts that have IPsec tunnel, PPP or some other transient interfaces. If anyone else sees the problem, it would be interested to see your strace and lsof output. I believe gmond could be tweaked, for example, to recreate (or re-bind) the socket with FD 8 after such an error. Doing so might log a more specific error or might successfully bind on a new local IP. Regards, Daniel ------------------------------------------------------------------------------ WINDOWS 8 is here. Millions of people. Your app in 30 days. Visit The Windows 8 Center at Sourceforge for all your go to resources. http://windows8center.sourceforge.net/ join-generation-app-and-make-money-coding-fast/ _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general