>>> On 7/13/2009 at 1:06 AM, in message <[email protected]>, Pavel Shevaev <[email protected]> wrote: > Hi folks, > > Looks like gmetad ignores reports from gmond returning records with > large negative TN values. > gmond started to behave like that after the computer was restarted. > > Here's a sample of gmond's output acquired with "nc localhost 8649": > > <GANGLIA_XML VERSION="3.1.2" SOURCE="gmond"> > <CLUSTER NAME="host1" LOCALTIME="1247467796" OWNER="BIT" > LATLONG="unspecified" URL="unspecified"> > <HOST NAME="localhost" IP="127.0.0.1" REPORTED="1247478928" > TN="-11132" TMAX="20" DMAX="0" LOCATION="unspecified" > GMOND_STARTED="1247478927"> > <METRIC NAME="tcp_closed" VAL="0" TYPE="uint32" UNITS="Sockets" > TN="-11143" TMAX="20" DMAX="0" SLOPE="both"> > ... > </METRIC> > > I believe these large negative TN values somehow make gmetad, gstat, > etc think the host is down. Here's what gstat says: > > CLUSTER INFORMATION > Name: host1 > Hosts: 0 > Gexec Hosts: 0 > Dead Hosts: 1 > Localtime: Mon Jul 13 10:55:02 2009 > > But gmond is definitely alive, here's some output from strace: > > $ sudo strace -p 15911 > Process 15911 attached - interrupt to quit > epoll_wait(3, {{EPOLLIN, {u32=7117640, u64=7117640}}}, 2, 10627587) = 1 > accept(5, {sa_family=AF_INET, sin_port=htons(42589), > sin_addr=inet_addr("192.168.4.10")}, [140733193388048]) = 7 > write(7, "<?xml version=\"1.0\" encoding=\"ISO"..., 2489) = 2489 > write(7, "<GANGLIA_XML VERSION=\"3.1.2\" SOUR"..., 45) = 45 > > After restarting gmond everything becomes fine. > > Any ideas on what can be the reason of such a strange behavior?
The only thing that I know of that would cause this behavior is if the system clocks on your various node are out of sync. TN report the time stamp offset between the time that the metric was actually gathered and the time that it is being reported to gmetad. If the system clock on the node that is gathering the metric is ahead of the system clock on the node that is reporting the metrics to gmetad, the calculation that determines the TN can go negative. Check to make sure that all of the system clocks on the nodes running gmond are all in sync. Brad ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

