Hello all,
A quick response would help! Our cluster nodes send udp unicast packets to a gmond 'collector'. The gmond.conf on all the nodes (compute and collector) has the following values: cleanup_threshold = 300 secs, heartbeat = 20 secs, collect_every = 300 secs, time_threshold = 900 secs Now, the gmetad server polls the gmond 'collector' every 300 secs. (5 minutes). What we see is that the nodes are shown up sometimes, and then down sometimes. They flap often. Generally, either all nodes are shown up or all nodes are shown down. While reporting the nodes are down, it also shows that it received a heartbeat within the last 20 seconds. We need to know the exact reason this is happening. The gmetad.conf file has default values for rrd archives. Changing the gmetad server to poll every 120 seconds, does not seem to solve the problem either. Any suggestions or guidelines to follow for gmetad polling interval and gmond cleanup_threshold values will be appreciated. Thanks, ---------------------------------------------------------------------------- -------- Utsav Agarwal Systems Analyst ---------------------------------------------------------------------------- --------

