[Ganglia-general] Limiting gmond traffic by upping thresholds

Ryan Dionne Wed, 21 Dec 2005 10:59:41 -0800

We had ganglia deployed in the past but stopped using because of too much
network communication. I finally have the time to look back at ganglia and
other options and I have been reading the archives and seen lots of messages
in favor of and against multicast traffic; but haven't been able to get
unicast to work in my setup.


So I started thinking rather than going the unicast route, just increase the
time/limit thresholds on the metrics gmond is collecting and sending out. In
my environment we don't mind trading lag-time in data for less network
activity; our nodes are usually running a job at 100% cpu or not running a
job at all.

I tried looking for more information on a suggested "limited" configuration,
but wasn't able to find anything. Below is my current stab at my gmond.conf,
I would like to get some suggestions and advice on what to change. I
increased most of the time thresholds and removed some of the
value_thresholds (like for memory) altogether.

> collection_group {
>   collect_once = yes
>   time_threshold = 60
>   metric { 
>     name = "heartbeat"
>   } 
> } 
> 
> /* This collection group will send general info about this host every 1200
> secs. 
>    This information doesn't change between reboots and is only collected once.
> */ 
> collection_group {
>   collect_once = yes
>   time_threshold = 3600
>   metric { 
>     name = "cpu_num"
>   } 
...snip...
>   metric { 
>     name = "location"
>   } 
> }

> collection_group {
>   collect_every = 20
>   time_threshold = 300
>   /* CPU status */
>   metric { 
>     name = "cpu_user"
>     value_threshold = "20.0"
>   } 
>   metric { 
>     name = "cpu_system"
>     value_threshold = "20.0"
>   } 
>   metric { 
>     name = "cpu_idle"
>     value_threshold = "30.0"
>   } 
>   metric { 
>     name = "cpu_nice"
>     value_threshold = "20.0"
>   } 
>   metric { 
>     name = "cpu_aidle"
>     value_threshold = "20.0"
>   } 
>   metric { 
>     name = "cpu_wio"
>     value_threshold = "20.0"
>   }
> } 
> 
> collection_group {
>   collect_every = 20
>   time_threshold = 300
>   /* Load Averages */
>   metric { 
>     name = "load_one"
>     value_threshold = "20.0"
>   } 
>   metric { 
>     name = "load_five"
>     value_threshold = "20.0"
>   } 
>   metric { 
>     name = "load_fifteen"
>     value_threshold = "20.0"
>   }
> } 
> 
> /* This group collects the number of running and total processes */
> collection_group {
>   collect_every = 80
>   time_threshold = 950
>   metric { 
>     name = "proc_run"
>   } 
>   metric { 
>     name = "proc_total"
>   } 
> }
> 
> /* This collection group grabs the volatile memory metrics every 40 secs and
>    sends them at least every 180 secs.  This time_threshold can be increased
>    significantly to reduce unneeded network traffic. */
> collection_group {
>   collect_every = 40
>   time_threshold = 720
>   metric { 
>     name = "mem_free"
>   } 
>   metric { 
>     name = "mem_shared"
>   } 
>   metric { 
>     name = "mem_buffers"
>   } 
>   metric { 
>     name = "mem_cached"
>   } 
>   metric { 
>     name = "swap_free"
>   } 
> } 
> 
> collection_group {
>   collect_every = 40
>   time_threshold = 300
>   metric { 
>     name = "bytes_out"
>     value_threshold = 4096
>   } 
>   metric { 
>     name = "bytes_in"
>     value_threshold = 4096
>   } 
>   metric { 
>     name = "pkts_in"
>     value_threshold = 256
>   } 
>   metric { 
>     name = "pkts_out"
>     value_threshold = 256
>   } 
> }
> 
> /* Different than 2.5.x default since the old config made no sense */
> collection_group {
>   collect_every = 1800
>   time_threshold = 3600
>   metric { 
>     name = "disk_total"
>     value_threshold = 20.0
>   } 
> }
> 
> collection_group {
>   collect_every = 40
>   time_threshold = 300
>   metric { 
>     name = "disk_free"
>     value_threshold = 10.0
>   } 
>   metric { 
>     name = "part_max_used"
>     value_threshold = 10.0
>   } 
> }


Thanks,
+R
-- 
Ryan Dionne
System Analyst      16800 Greenspoint Park Dr.
GeoCenter, Inc.     Suite 100-S
(281) 443-8150      Houston, TX 77060
Celebrating our 25th Anniversary: 1980 - 2005

[Ganglia-general] Limiting gmond traffic by upping thresholds

Reply via email to