Hi folks,
Looks like gmetad ignores reports from gmond returning records with
large negative TN values.
gmond started to behave like that after the computer was restarted.
Here's a sample of gmond's output acquired with "nc localhost 8649":
<GANGLIA_XML VERSION="3.1.2" SOURCE="gmond">
<CLUSTER NAME="host1" LOCALTIME="1247467796" OWNER="BIT"
LATLONG="unspecified" URL="unspecified">
<HOST NAME="localhost" IP="127.0.0.1" REPORTED="1247478928"
TN="-11132" TMAX="20" DMAX="0" LOCATION="unspecified"
GMOND_STARTED="1247478927">
<METRIC NAME="tcp_closed" VAL="0" TYPE="uint32" UNITS="Sockets"
TN="-11143" TMAX="20" DMAX="0" SLOPE="both">
...
</METRIC>
I believe these large negative TN values somehow make gmetad, gstat,
etc think the host is down. Here's what gstat says:
CLUSTER INFORMATION
Name: host1
Hosts: 0
Gexec Hosts: 0
Dead Hosts: 1
Localtime: Mon Jul 13 10:55:02 2009
But gmond is definitely alive, here's some output from strace:
$ sudo strace -p 15911
Process 15911 attached - interrupt to quit
epoll_wait(3, {{EPOLLIN, {u32=7117640, u64=7117640}}}, 2, 10627587) = 1
accept(5, {sa_family=AF_INET, sin_port=htons(42589),
sin_addr=inet_addr("192.168.4.10")}, [140733193388048]) = 7
write(7, "<?xml version=\"1.0\" encoding=\"ISO"..., 2489) = 2489
write(7, "<GANGLIA_XML VERSION=\"3.1.2\" SOUR"..., 45) = 45
After restarting gmond everything becomes fine.
Any ideas on what can be the reason of such a strange behavior?
--
Best regards, Pavel
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general