My solution and notes inline below.

On Wed, May 14, 2008 at 8:02 AM, Brad Nicholes <[EMAIL PROTECTED]> wrote:

> >>> On 5/13/2008 at 11:34 PM, in message
> <[EMAIL PROTECTED]>, "Jeremy
> LaTrasse" <[EMAIL PROTECTED]> wrote:
> > I changed our configs over to unicast, which as seemingly eliminated most
> of
> > our problems, except one egregious one, and the log files are still being
> > filled with
> >
> >  illegal attempt to update using time 1210742641 when last update time is
> > 1210742641 (minimum one second step)
> >
> > The problem seems to be that the gmetad is not able to get new
> information
> > from headnodes.
> >
> > In the case of 1210742641, the node had not reported to the headnode for
> 54
> > sec, and therefore rrdtool on the gmetad could not be expected to update
> a
> > file with the same information.
> >
> > Output from headnodes for that node confirms.
> >
> > <HOST NAME="HOSTNAME.twitter.com" IP="X.X.X.X" REPORTED="1210742641"
> TN="54"
> > TMAX="20" DMAX="86400" LOCATION="1" GMOND_STARTED="1210742641">
> >
> > My question now is, why would gmond not be reporting for 54 sec?  The
> load
> > on the machines that are taking longer that 20 sec to report consistently
> is
> > lower than others in the cluster who report far more frequently.
> >
>
> Have you run gmond on that particular node in debug mode to verify that it
> is gathering and sending data correctly?  Have you hit that node directly on
> port 8649 to make sure that it is generating correct XML output?  Have you
> run your head node gmond in debug mode to verify that it is only receiving
> packets from the problem node every 54 seconds?  Just trying to narrow down
> whether it is a problem with the node gmond, the head node gmond or gmetad
> or something in-between.
>

Yes, my testing at each phase of troubleshooting was to compare the log
output with the gmetad output with the gmond headnode output with the gmond
node output.


>
> > Next, how can I change HOST TMAX if necessary? I've read the gmond.conf
> man
> > page, the wiki, etc... seems like only location is configurable for host
> in
> > the gmond.conf.
> >
>
> You can't.  For some reason TMAX is hardcoded to 20.  I'm not sure why.
>  The only way to change it would be to change the source and rebuild gmond.
>

Seems like TMAX should be a config file option.

My solution was to turn the collect_every = to 10 seconds across *all*
checks in the gmond.conf. I couldn't find any correlating reason for these
checks to be taking longer than 20 seconds. System load, multicast, unicast,
all seem irrelevant to this problem. It sounds like it's a time collection
or comparison issue in the gmond itself, but I don't have time to dig into
it right now.

There are no more errors in the logs because rrdtool is no longer being
asked to apply stale data.

My graphs are back to the way I would expect them to be, although it seems
that a 10 second check period makes some graphs seem spikier than before all
this.

I'm still looking for someone with deep experience to help us build out our
ganglia practice here on a contract basis.

Jeremy


>
> Brad
>
>
> > Again, system time is synchronized across all these machines to within
> .04
> > seconds.
> >
> >
> > Jeremy
> >
> > On Tue, May 13, 2008 at 3:50 PM, Bernard Li <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Jeremy:
> >>
> >> On Tue, May 13, 2008 at 1:49 PM, Jeremy LaTrasse <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Where should I be going for comprehensive documentation that describes
> >> each
> >> > of the stanzas in both gmond and gmetad config files, is there one
> >> standard
> >> > document? I can't find one in the sourceforge wiki.
> >>
> >> For Ganglia 3.0.x, man gmond.conf is your best bet.  I checked and it
> >> talks about unicast.
> >>
> >> For gmetad -- the configuration options are pretty straightforward,
> >> and the comments in the standard gmetad.conf should be fairly
> >> self-explanatory.
> >>
> >> Cheers,
> >>
> >> Bernard
> >>
>
>
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to