I noticed logs filling with "Error 1 sending the modular data"

Google reveals this has been discussed several times in the past, and
none of the discussions ended with a solution, so I'm presenting some
analysis below.

Here is what I did and what I found:

I discovered my gmond PID = 21015 and I checked it with strace:

   strace -p 21015 -o /tmp/gmond.errs -v

After about a minute, I had a look inside /tmp/gmond.errs, lots of this:

write(7, "\0\0\0\205\0\0\0\4srv1\0\0\0\fmachine_type\0\0\0\0"..., 52) = 52
write(8, "\0\0\0\205\0\0\0\4srv1\0\0\0\fmachine_type\0\0\0\0"..., 52) =
-1 EINVAL (Invalid argument)
write(7, "\0\0\0\200\0\0\0\4srv1\0\0\0\7os_name\0\0\0\0\0\0\0\0\6"...,
164) = 164
write(8, "\0\0\0\200\0\0\0\4srv1\0\0\0\7os_name\0\0\0\0\0\0\0\0\6"...,
164) = -1 EINVAL (Invalid argument)
time([1351418592])                      = 1351418592
sendto(9, "<30>Oct 28 11:03:12 /usr/sbin/gm"..., 90, MSG_NOSIGNAL, NULL,
0) = 90


Notice the `sendto' is actually sending the error to syslog, not sending
a metric packet

Ok, the `write' calls show me two file descriptors, 7 and 8.  writes to
FD 8 are failing with EINVAL:

write(8, .... ) = -1 EINVAL (Invalid argument)

The file descriptors correspond to two different udp_send_channels in
gmond.conf - but which is which?  Fortunately, lsof tells me:

  lsof -p 21015 -n

gmond   21015 ganglia    7u  IPv4            2747622      0t0      UDP
192.168.1.2:44778->239.2.11.71:8649

gmond   21015 ganglia    8u  IPv4            2747628      0t0      UDP
(VPN address):53976->(remote server address):8649

Notice that FD 7 corresponds to a very standard multicast channel, while
FD 8 corresponds to a UDP unicast channel.  I have deleted the IP
addresses, but this immediately revealed the problem (in my case
anyway): the local address (VPN address) existed when gmond started, but
no longer exists on this machine (because the VPN is not always up).

I can imagine similar problems would occur for hosts that get an IP by
means of DHCP, or hosts that have IPsec tunnel, PPP or some other
transient interfaces.

If anyone else sees the problem, it would be interested to see your
strace and lsof output.  I believe gmond could be tweaked, for example,
to recreate (or re-bind) the socket with FD 8 after such an error.
Doing so might log a more specific error or might successfully bind on a
new local IP.

Regards,

Daniel

------------------------------------------------------------------------------
WINDOWS 8 is here. 
Millions of people.  Your app in 30 days.
Visit The Windows 8 Center at Sourceforge for all your go to resources.
http://windows8center.sourceforge.net/
join-generation-app-and-make-money-coding-fast/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to