Hello all,

 

A quick response would help!

 

Our cluster nodes send udp unicast packets to a gmond 'collector'. The
gmond.conf on all the nodes (compute and collector) has the following
values: 

cleanup_threshold = 300 secs, heartbeat = 20 secs, collect_every = 300 secs,
time_threshold = 900 secs

 

Now, the gmetad server polls the gmond 'collector' every 300 secs. (5
minutes). What we see is that the nodes are shown up sometimes, and then
down sometimes. They flap often. Generally, either all nodes are shown up or
all nodes are shown down. While reporting the nodes are down, it also shows
that it received a heartbeat within the last 20 seconds.

 

We need to know the exact reason this is happening.

 

The gmetad.conf file has default values for rrd archives. Changing the
gmetad server to poll every 120 seconds, does not seem to solve the problem
either.

 

Any suggestions or guidelines to follow for gmetad polling interval and
gmond cleanup_threshold values will be appreciated.

 

Thanks,

----------------------------------------------------------------------------
--------

Utsav Agarwal

Systems Analyst

----------------------------------------------------------------------------
--------

 

Reply via email to