Re: [Ganglia-developers] The third rule in a crisis situation

Steven Wagner Mon, 09 Sep 2002 12:53:26 -0700

matt massie wrote:

Today, Steven Wagner wrote forth saying...
I'll tell you this, though - I'm seeing gaps in both cluster graphs
like you wouldn't believe.  I don't know what's causing it but I
assume it's related to gmetad assuming its data sources are dead so
quickly.  It can't be an "it's an old version" thing because it
happens on both new and old versions...
if you take a look at data_thread.c (around line 74), you'll see how C
gmetad is pulling the data.  i'm using 10 second timeouts per 1024 bytes
of data read.  this means that if a datasource is unable to deliver 102.4
bytes/sec (820 bits/s) it is considered down. you might play with thevalues to see what works best for you. if the timeout is too long thenyou'll miss RRD heartbeats and have dead spots anyway. the gaps are goodindications of transient network connect problems.

I'll tweak these values after lunch, but so far it's looking like twochunks of code are "at fault," one controlling the dead-source flag for newmonitoring sources and one controlling the dead-source flag for pre-2.5.0sources. My old source has far more dropouts (lasting 60 seconds or more)than the new source.

And the perl gmetad never had trouble polling 'em. So I don't know whatthe deal is at this point...

Re: [Ganglia-developers] The third rule in a crisis situation

Reply via email to