Ganglia 3.1.0 on CentOS 4.

Earlier today one of my clusters stopped reporting.
Grid server logged these to syslog:
 /usr/sbin/gmetad[12271]: poll() timeout for [Web] data source after 0 bytes 
read 

I checked that gmond was running on that host, and it was.
However, attempts to connect to its port 8649 would indeed timeout.

I tried to see what it was doing and got:

# strace -p 16830
Process 16830 attached - interrupt to quit
write(7, "<EXTRA_DATA>\n", 13 <unfinished ...>
Process 16830 detached

... I had to ^C after a minute.

I captured lsof output, then restarted gmond, and it started working.

Here's the lsof output:

# lsof | grep gmond
gmond     16830   ganglia  cwd       DIR                8,3      4096          
2 /
gmond     16830   ganglia  rtd       DIR                8,3      4096          
2 /
gmond     16830   ganglia  txt       REG                8,3     62688    
3446145 /usr/sbin/gmond
gmond     16830   ganglia  mem       REG                8,3  48517056    
2084173 /usr/lib/locale/locale-archive
gmond     16830   ganglia  mem       REG                8,3     64872    
2279574 /usr/lib64/ganglia/modcpu.so
gmond     16830   ganglia  mem       REG                8,3     62512    
2279575 /usr/lib64/ganglia/moddisk.so
gmond     16830   ganglia  mem       REG                8,3     62480    
2279576 /usr/lib64/ganglia/modload.so
gmond     16830   ganglia  mem       REG                8,3     63720    
2279577 /usr/lib64/ganglia/modmem.so
gmond     16830   ganglia  mem       REG                8,3     62824    
2279579 /usr/lib64/ganglia/modnet.so
gmond     16830   ganglia  mem       REG                8,3     62224    
2279580 /usr/lib64/ganglia/modproc.so
gmond     16830   ganglia  mem       REG                8,3     63432    
2279581 /usr/lib64/ganglia/modsys.so
gmond     16830   ganglia  mem       REG                8,3     56902     
966686 /lib64/libnss_files-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3    105080     
966890 /lib64/ld-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3   1493409     
966891 /lib64/tls/libc-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3     11784     
966731 /lib64/libuuid.so.1.2
gmond     16830   ganglia  mem       REG                8,3     17943     
966893 /lib64/libdl-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3    106203     
966894 /lib64/tls/libpthread-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3     91412     
966660 /lib64/libresolv-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3     30070     
966906 /lib64/libcrypt-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3    143336    
3801241 /usr/lib64/libexpat.so.0.5.0
gmond     16830   ganglia  mem       REG                8,3    107187     
966901 /lib64/libnsl-2.3.4.so
gmond     16830   ganglia  mem       REG                8,3     88824    
3801183 /usr/lib64/libganglia-3.1.0.so.0.0.0
gmond     16830   ganglia  mem       REG                8,3    171976    
3801192 /usr/lib64/libapr-1.so.0.3.2
gmond     16830   ganglia  mem       REG                8,3     46392    
3801185 /usr/lib64/libconfuse.so.0.0.0
gmond     16830   ganglia    0r      CHR                1,3                 
1977 /dev/null
gmond     16830   ganglia    1w      CHR                1,3                 
1977 /dev/null
gmond     16830   ganglia    2w      CHR                1,3                 
1977 /dev/null
gmond     16830   ganglia    3r     0000                0,8         0    
2039262 eventpoll
gmond     16830   ganglia    4u     IPv4            2039264                  
UDP 239.192.0.127:8649 
gmond     16830   ganglia    5u     IPv4            2039266                  
TCP *:8649 (LISTEN)

Anyone seen this?  Any clues as to what might have put it in this state?

P.S. gmetad should've fallen back on another data source, but that's
another email thread that we've already had :)
-- Cos

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to