>>> On 7/13/2009 at 1:06 AM, in message
<[email protected]>, Pavel Shevaev
<[email protected]> wrote:
> Hi folks,
> 
> Looks like gmetad ignores reports from gmond returning records with
> large negative TN values.
> gmond started to behave like that after the computer was restarted.
> 
> Here's a sample of gmond's output acquired with "nc localhost 8649":
> 
> <GANGLIA_XML VERSION="3.1.2" SOURCE="gmond">
> <CLUSTER NAME="host1" LOCALTIME="1247467796" OWNER="BIT"
> LATLONG="unspecified" URL="unspecified">
> <HOST NAME="localhost" IP="127.0.0.1" REPORTED="1247478928"
> TN="-11132" TMAX="20" DMAX="0" LOCATION="unspecified"
> GMOND_STARTED="1247478927">
> <METRIC NAME="tcp_closed" VAL="0" TYPE="uint32" UNITS="Sockets"
> TN="-11143" TMAX="20" DMAX="0" SLOPE="both">
> ...
> </METRIC>
> 
> I believe these large negative TN values somehow make gmetad, gstat,
> etc  think the host is down. Here's what gstat says:
> 
> CLUSTER INFORMATION
>       Name: host1
>       Hosts: 0
> Gexec Hosts: 0
>  Dead Hosts: 1
>   Localtime: Mon Jul 13 10:55:02 2009
> 
> But gmond is definitely alive, here's some output from strace:
> 
>  $ sudo strace -p 15911
> Process 15911 attached - interrupt to quit
> epoll_wait(3, {{EPOLLIN, {u32=7117640, u64=7117640}}}, 2, 10627587) = 1
> accept(5, {sa_family=AF_INET, sin_port=htons(42589),
> sin_addr=inet_addr("192.168.4.10")}, [140733193388048]) = 7
> write(7, "<?xml version=\"1.0\" encoding=\"ISO"..., 2489) = 2489
> write(7, "<GANGLIA_XML VERSION=\"3.1.2\" SOUR"..., 45) = 45
> 
> After restarting gmond everything becomes fine.
> 
> Any ideas on what can be the reason of such a strange behavior?


The only thing that I know of that would cause this behavior is if the system 
clocks on your various node are out of sync.  TN report the time stamp offset 
between the time that the metric was actually gathered and the time that it is 
being reported to gmetad.  If the system clock on the node that is gathering 
the metric is ahead of the system clock on the node that is reporting the 
metrics to gmetad, the calculation that determines the TN can go negative.  
Check to make sure that all of the system clocks on the nodes running gmond are 
all in sync.

Brad


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to