Re: [Ganglia-general] gmetad only reads from one node of each data_source

Joan Marc Riera Fri, 22 Oct 2010 03:49:26 -0700

My gmond conf has only minor changes. I'm happy to share them .

I attach 3 files, gmond from node01 , node02 and nodegpu01.

As I remember, the only changes I had done are name and owner depending on the Cluster group, and the upd send and recv channel to be different for each Cluster group.

Thanks.

On 10/22/2010 12:30 PM, Martin Knoblauch wrote:

Hi Joan,

what you describe sounds fine with regard to "gmetad". "gmetad" will only talk one node per data_source. If that node fails and you have more than one node listed, it will [try to] failover to the next available node. So far, everything is working as expected.

Your problem is that apparently each of node01..10 only "knows" its own metrics. Nodes listed on the data_source line need to know the metrics of all nodes in the respective cluster. So it is more a problem with the configuration of your "gmond" services. Care to share the configuration of one of the nodes?

Cheers

Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
From: Joan Marc Riera <[email protected]>
To: [email protected]
Sent: Fri, October 22, 2010 11:50:05 AM
Subject: [Ganglia-general] gmetad only reads from one node of each data_source

Hello,

I have gmetad with following conf running :
r...@fbmsgga01:/var/lib/ganglia# cat /etc/ganglia/gmetad.conf |grep -v ^# |grep -v ^$
data_source "CPU cluster" node01 node02 node03 node04 node05 node06 node07 node08 node09 node10
data_source "GPU cluster" nodegpu01
 gridname "FBM"
r...@fbmsgga01:/var/lib/ganglia#
    
All nodes and gmetad server are on the same vlan.

I only recieve nodegpu01 and node01 info, but if I stop gmond on node01 I start receiving from node02. If I stop node02 I start receiving from node03, and so on.

I do not understant what is happening, everithing was working fine until yesterday, when I restarted gmetad host.

data from nodegpu01 is being received and plotted fine.

What is going on here?

Thanks.

Marc

--

Joan Marc Riera Duocastella

Barcelona Media - Centre d'Innovació
Av. Diagonal, 177, planta 9 08018 - BARCELONA
Telèfon +34 93 238 14 00 Fax +34 93 309 31 88
www.barcelonamedia.org

Joan Marc Riera Duocastella

Barcelona Media - Centre d'Innovació
Av. Diagonal, 177, planta 9 08018 - BARCELONA
Telèfon +34 93 238 14 00 Fax +34 93 309 31 88
www.barcelonamedia.org

--- Begin Message ---

/* This configuration is as close to 2.5.x default behavior as possible 
   The values closely match ./gmond/metric.h definitions in 2.5.x */ 
globals {                    
  daemonize = yes              
  setuid = yes             
  user = ganglia              
  debug_level = 0               
  max_udp_msg_len = 1472        
  mute = no             
  deaf = no             
  host_dmax = 0 /*secs */ 
  cleanup_threshold = 300 /*secs */ 
  gexec = no             
  send_metadata_interval = 0     
} 

/* If a cluster attribute is specified, then all gmond hosts are wrapped inside 
 * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> 
will 
 * NOT be wrapped inside of a <CLUSTER> tag. */ 
cluster { 
  name = "GPU cluster" 
  owner = "Propietat de FBM" 
  latlong = "unspecified" 
  url = "unspecified" 
} 

/* The host section describes attributes of the host, like the location */ 
host { 
  location = "[email protected]" 
} 

/* Feel free to specify as many udp_send_channels as you like.  Gmond 
   used to only support having a single channel */ 
udp_send_channel { 
  mcast_join = 239.2.11.72
  port = 8649 
  ttl = 1 
} 

/* You can specify as many udp_recv_channels as you like as well. */ 
udp_recv_channel { 
  mcast_join = 239.2.11.72 
  port = 8649 
  bind = 239.2.11.72
} 

/* You can specify as many tcp_accept_channels as you like to share 
   an xml description of the state of the cluster */ 
tcp_accept_channel { 
  port = 8649 
} 

/* Each metrics module that is referenced by gmond must be specified and 
   loaded. If the module has been statically linked with gmond, it does not 
   require a load path. However all dynamically loadable modules must include 
   a load path. */ 
modules { 
  module { 
    name = "core_metrics" 
  } 
  module { 
    name = "cpu_module" 
    path = "/usr/lib/ganglia/modcpu.so" 
  } 
  module { 
    name = "disk_module" 
    path = "/usr/lib/ganglia/moddisk.so" 
  } 
  module { 
    name = "load_module" 
    path = "/usr/lib/ganglia/modload.so" 
  } 
  module { 
    name = "mem_module" 
    path = "/usr/lib/ganglia/modmem.so" 
  } 
  module { 
    name = "net_module" 
    path = "/usr/lib/ganglia/modnet.so" 
  } 
  module { 
    name = "proc_module" 
    path = "/usr/lib/ganglia/modproc.so" 
  } 
  module { 
    name = "sys_module" 
    path = "/usr/lib/ganglia/modsys.so" 
  } 
} 

include ('/etc/ganglia/conf.d/*.conf') 


/* The old internal 2.5.x metric array has been replaced by the following 
   collection_group directives.  What follows is the default behavior for 
   collecting and sending metrics that is as close to 2.5.x behavior as 
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every 
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses 
   the age of the running gmond. */ 
collection_group { 
  collect_once = yes 
  time_threshold = 20 
  metric { 
    name = "heartbeat" 
  } 
} 

/* This collection group will send general info about this host every 1200 
secs. 
   This information doesn't change between reboots and is only collected once. 
*/ 
collection_group { 
  collect_once = yes 
  time_threshold = 1200 
  metric { 
    name = "cpu_num" 
    title = "CPU Count" 
  } 
  metric { 
    name = "cpu_speed" 
    title = "CPU Speed" 
  } 
  metric { 
    name = "mem_total" 
    title = "Memory Total" 
  } 
  /* Should this be here? Swap can be added/removed between reboots. */ 
  metric { 
    name = "swap_total" 
    title = "Swap Space Total" 
  } 
  metric { 
    name = "boottime" 
    title = "Last Boot Time" 
  } 
  metric { 
    name = "machine_type" 
    title = "Machine Type" 
  } 
  metric { 
    name = "os_name" 
    title = "Operating System" 
  } 
  metric { 
    name = "os_release" 
    title = "Operating System Release" 
  } 
  metric { 
    name = "location" 
    title = "Location" 
  } 
} 

/* This collection group will send the status of gexecd for this host every 300 
secs */
/* Unlike 2.5.x the default behavior is to report gexecd OFF.  */ 
collection_group { 
  collect_once = yes 
  time_threshold = 300 
  metric { 
    name = "gexec" 
    title = "Gexec Status" 
  } 
} 

/* This collection group will collect the CPU status info every 20 secs. 
   The time threshold is set to 90 seconds.  In honesty, this time_threshold 
could be 
   set significantly higher to reduce unneccessary network chatter. */ 
collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* CPU status */ 
  metric { 
    name = "cpu_user"  
    value_threshold = "1.0" 
    title = "CPU User" 
  } 
  metric { 
    name = "cpu_system"   
    value_threshold = "1.0" 
    title = "CPU System" 
  } 
  metric { 
    name = "cpu_idle"  
    value_threshold = "5.0" 
    title = "CPU Idle" 
  } 
  metric { 
    name = "cpu_nice"  
    value_threshold = "1.0" 
    title = "CPU Nice" 
  } 
  metric { 
    name = "cpu_aidle" 
    value_threshold = "5.0" 
    title = "CPU aidle" 
  } 
  metric { 
    name = "cpu_wio" 
    value_threshold = "1.0" 
    title = "CPU wio" 
  } 
  /* The next two metrics are optional if you want more detail... 
     ... since they are accounted for in cpu_system.  
  metric { 
    name = "cpu_intr" 
    value_threshold = "1.0" 
    title = "CPU intr" 
  } 
  metric { 
    name = "cpu_sintr" 
    value_threshold = "1.0" 
    title = "CPU sintr" 
  } 
  */ 
} 

collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* Load Averages */ 
  metric { 
    name = "load_one" 
    value_threshold = "1.0" 
    title = "One Minute Load Average" 
  } 
  metric { 
    name = "load_five" 
    value_threshold = "1.0" 
    title = "Five Minute Load Average" 
  } 
  metric { 
    name = "load_fifteen" 
    value_threshold = "1.0" 
    title = "Fifteen Minute Load Average" 
  }
} 

/* This group collects the number of running and total processes */ 
collection_group { 
  collect_every = 80 
  time_threshold = 950 
  metric { 
    name = "proc_run" 
    value_threshold = "1.0" 
    title = "Total Running Processes" 
  } 
  metric { 
    name = "proc_total" 
    value_threshold = "1.0" 
    title = "Total Processes" 
  } 
}

/* This collection group grabs the volatile memory metrics every 40 secs and 
   sends them at least every 180 secs.  This time_threshold can be increased 
   significantly to reduce unneeded network traffic. */ 
collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "mem_free" 
    value_threshold = "1024.0" 
    title = "Free Memory" 
  } 
  metric { 
    name = "mem_shared" 
    value_threshold = "1024.0" 
    title = "Shared Memory" 
  } 
  metric { 
    name = "mem_buffers" 
    value_threshold = "1024.0" 
    title = "Memory Buffers" 
  } 
  metric { 
    name = "mem_cached" 
    value_threshold = "1024.0" 
    title = "Cached Memory" 
  } 
  metric { 
    name = "swap_free" 
    value_threshold = "1024.0" 
    title = "Free Swap Space" 
  } 
} 

collection_group { 
  collect_every = 40 
  time_threshold = 300 
  metric { 
    name = "bytes_out" 
    value_threshold = 4096 
    title = "Bytes Sent" 
  } 
  metric { 
    name = "bytes_in" 
    value_threshold = 4096 
    title = "Bytes Received" 
  } 
  metric { 
    name = "pkts_in" 
    value_threshold = 256 
    title = "Packets Received" 
  } 
  metric { 
    name = "pkts_out" 
    value_threshold = 256 
    title = "Packets Sent" 
  } 
}

/* Different than 2.5.x default since the old config made no sense */ 
collection_group { 
  collect_every = 1800 
  time_threshold = 3600 
  metric { 
    name = "disk_total" 
    value_threshold = 1.0 
    title = "Total Disk Space" 
  } 
}

collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "disk_free" 
    value_threshold = 1.0 
    title = "Disk Space Available" 
  } 
  metric { 
    name = "part_max_used" 
    value_threshold = 1.0 
    title = "Maximum Disk Space Used" 
  } 
}

--- End Message ---

--- Begin Message ---

/* This configuration is as close to 2.5.x default behavior as possible 
   The values closely match ./gmond/metric.h definitions in 2.5.x */ 
globals {                    
  daemonize = yes              
  setuid = yes             
  user = ganglia              
  debug_level = 0               
  max_udp_msg_len = 1472        
  mute = no             
  deaf = no             
  host_dmax = 0 /*secs */ 
  cleanup_threshold = 300 /*secs */ 
  gexec = no             
  send_metadata_interval = 0     
} 

/* If a cluster attribute is specified, then all gmond hosts are wrapped inside 
 * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> 
will 
 * NOT be wrapped inside of a <CLUSTER> tag. */ 
cluster { 
  name = "CPU cluster" 
  owner = "Propietat de FBM" 
  latlong = "unspecified" 
  url = "unspecified" 
} 

/* The host section describes attributes of the host, like the location */ 
host { 
  location = "[email protected]" 
} 

/* Feel free to specify as many udp_send_channels as you like.  Gmond 
   used to only support having a single channel */ 
udp_send_channel { 
  mcast_join = 239.2.11.71
  port = 8649 
  ttl = 1 
} 

/* You can specify as many udp_recv_channels as you like as well. */ 
udp_recv_channel { 
  mcast_join = 239.2.11.71 
  port = 8649 
  bind = 239.2.11.71
} 

/* You can specify as many tcp_accept_channels as you like to share 
   an xml description of the state of the cluster */ 
tcp_accept_channel { 
  port = 8649 
} 

/* Each metrics module that is referenced by gmond must be specified and 
   loaded. If the module has been statically linked with gmond, it does not 
   require a load path. However all dynamically loadable modules must include 
   a load path. */ 
modules { 
  module { 
    name = "core_metrics" 
  } 
  module { 
    name = "cpu_module" 
    path = "/usr/lib/ganglia/modcpu.so" 
  } 
  module { 
    name = "disk_module" 
    path = "/usr/lib/ganglia/moddisk.so" 
  } 
  module { 
    name = "load_module" 
    path = "/usr/lib/ganglia/modload.so" 
  } 
  module { 
    name = "mem_module" 
    path = "/usr/lib/ganglia/modmem.so" 
  } 
  module { 
    name = "net_module" 
    path = "/usr/lib/ganglia/modnet.so" 
  } 
  module { 
    name = "proc_module" 
    path = "/usr/lib/ganglia/modproc.so" 
  } 
  module { 
    name = "sys_module" 
    path = "/usr/lib/ganglia/modsys.so" 
  } 
} 

include ('/etc/ganglia/conf.d/*.conf') 


/* The old internal 2.5.x metric array has been replaced by the following 
   collection_group directives.  What follows is the default behavior for 
   collecting and sending metrics that is as close to 2.5.x behavior as 
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every 
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses 
   the age of the running gmond. */ 
collection_group { 
  collect_once = yes 
  time_threshold = 20 
  metric { 
    name = "heartbeat" 
  } 
} 

/* This collection group will send general info about this host every 1200 
secs. 
   This information doesn't change between reboots and is only collected once. 
*/ 
collection_group { 
  collect_once = yes 
  time_threshold = 1200 
  metric { 
    name = "cpu_num" 
    title = "CPU Count" 
  } 
  metric { 
    name = "cpu_speed" 
    title = "CPU Speed" 
  } 
  metric { 
    name = "mem_total" 
    title = "Memory Total" 
  } 
  /* Should this be here? Swap can be added/removed between reboots. */ 
  metric { 
    name = "swap_total" 
    title = "Swap Space Total" 
  } 
  metric { 
    name = "boottime" 
    title = "Last Boot Time" 
  } 
  metric { 
    name = "machine_type" 
    title = "Machine Type" 
  } 
  metric { 
    name = "os_name" 
    title = "Operating System" 
  } 
  metric { 
    name = "os_release" 
    title = "Operating System Release" 
  } 
  metric { 
    name = "location" 
    title = "Location" 
  } 
} 

/* This collection group will send the status of gexecd for this host every 300 
secs */
/* Unlike 2.5.x the default behavior is to report gexecd OFF.  */ 
collection_group { 
  collect_once = yes 
  time_threshold = 300 
  metric { 
    name = "gexec" 
    title = "Gexec Status" 
  } 
} 

/* This collection group will collect the CPU status info every 20 secs. 
   The time threshold is set to 90 seconds.  In honesty, this time_threshold 
could be 
   set significantly higher to reduce unneccessary network chatter. */ 
collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* CPU status */ 
  metric { 
    name = "cpu_user"  
    value_threshold = "1.0" 
    title = "CPU User" 
  } 
  metric { 
    name = "cpu_system"   
    value_threshold = "1.0" 
    title = "CPU System" 
  } 
  metric { 
    name = "cpu_idle"  
    value_threshold = "5.0" 
    title = "CPU Idle" 
  } 
  metric { 
    name = "cpu_nice"  
    value_threshold = "1.0" 
    title = "CPU Nice" 
  } 
  metric { 
    name = "cpu_aidle" 
    value_threshold = "5.0" 
    title = "CPU aidle" 
  } 
  metric { 
    name = "cpu_wio" 
    value_threshold = "1.0" 
    title = "CPU wio" 
  } 
  /* The next two metrics are optional if you want more detail... 
     ... since they are accounted for in cpu_system.  
  metric { 
    name = "cpu_intr" 
    value_threshold = "1.0" 
    title = "CPU intr" 
  } 
  metric { 
    name = "cpu_sintr" 
    value_threshold = "1.0" 
    title = "CPU sintr" 
  } 
  */ 
} 

collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* Load Averages */ 
  metric { 
    name = "load_one" 
    value_threshold = "1.0" 
    title = "One Minute Load Average" 
  } 
  metric { 
    name = "load_five" 
    value_threshold = "1.0" 
    title = "Five Minute Load Average" 
  } 
  metric { 
    name = "load_fifteen" 
    value_threshold = "1.0" 
    title = "Fifteen Minute Load Average" 
  }
} 

/* This group collects the number of running and total processes */ 
collection_group { 
  collect_every = 80 
  time_threshold = 950 
  metric { 
    name = "proc_run" 
    value_threshold = "1.0" 
    title = "Total Running Processes" 
  } 
  metric { 
    name = "proc_total" 
    value_threshold = "1.0" 
    title = "Total Processes" 
  } 
}

/* This collection group grabs the volatile memory metrics every 40 secs and 
   sends them at least every 180 secs.  This time_threshold can be increased 
   significantly to reduce unneeded network traffic. */ 
collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "mem_free" 
    value_threshold = "1024.0" 
    title = "Free Memory" 
  } 
  metric { 
    name = "mem_shared" 
    value_threshold = "1024.0" 
    title = "Shared Memory" 
  } 
  metric { 
    name = "mem_buffers" 
    value_threshold = "1024.0" 
    title = "Memory Buffers" 
  } 
  metric { 
    name = "mem_cached" 
    value_threshold = "1024.0" 
    title = "Cached Memory" 
  } 
  metric { 
    name = "swap_free" 
    value_threshold = "1024.0" 
    title = "Free Swap Space" 
  } 
} 

collection_group { 
  collect_every = 40 
  time_threshold = 300 
  metric { 
    name = "bytes_out" 
    value_threshold = 4096 
    title = "Bytes Sent" 
  } 
  metric { 
    name = "bytes_in" 
    value_threshold = 4096 
    title = "Bytes Received" 
  } 
  metric { 
    name = "pkts_in" 
    value_threshold = 256 
    title = "Packets Received" 
  } 
  metric { 
    name = "pkts_out" 
    value_threshold = 256 
    title = "Packets Sent" 
  } 
}

/* Different than 2.5.x default since the old config made no sense */ 
collection_group { 
  collect_every = 1800 
  time_threshold = 3600 
  metric { 
    name = "disk_total" 
    value_threshold = 1.0 
    title = "Total Disk Space" 
  } 
}

collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "disk_free" 
    value_threshold = 1.0 
    title = "Disk Space Available" 
  } 
  metric { 
    name = "part_max_used" 
    value_threshold = 1.0 
    title = "Maximum Disk Space Used" 
  } 
}

--- End Message ---

--- Begin Message ---

/* This configuration is as close to 2.5.x default behavior as possible 
   The values closely match ./gmond/metric.h definitions in 2.5.x */ 
globals {                    
  daemonize = yes              
  setuid = yes             
  user = ganglia              
  debug_level = 0               
  max_udp_msg_len = 1472        
  mute = no             
  deaf = no             
  host_dmax = 0 /*secs */ 
  cleanup_threshold = 300 /*secs */ 
  gexec = no             
  send_metadata_interval = 0     
} 

/* If a cluster attribute is specified, then all gmond hosts are wrapped inside 
 * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> 
will 
 * NOT be wrapped inside of a <CLUSTER> tag. */ 
cluster { 
  name = "CPU cluster" 
  owner = "Propietat de FBM" 
  latlong = "unspecified" 
  url = "unspecified" 
} 

/* The host section describes attributes of the host, like the location */ 
host { 
  location = "[email protected]" 
} 

/* Feel free to specify as many udp_send_channels as you like.  Gmond 
   used to only support having a single channel */ 
udp_send_channel { 
  mcast_join = 239.2.11.71
  port = 8649 
  ttl = 1 
} 

/* You can specify as many udp_recv_channels as you like as well. */ 
udp_recv_channel { 
  mcast_join = 239.2.11.71 
  port = 8649 
  bind = 239.2.11.71
} 

/* You can specify as many tcp_accept_channels as you like to share 
   an xml description of the state of the cluster */ 
tcp_accept_channel { 
  port = 8649 
} 

/* Each metrics module that is referenced by gmond must be specified and 
   loaded. If the module has been statically linked with gmond, it does not 
   require a load path. However all dynamically loadable modules must include 
   a load path. */ 
modules { 
  module { 
    name = "core_metrics" 
  } 
  module { 
    name = "cpu_module" 
    path = "/usr/lib/ganglia/modcpu.so" 
  } 
  module { 
    name = "disk_module" 
    path = "/usr/lib/ganglia/moddisk.so" 
  } 
  module { 
    name = "load_module" 
    path = "/usr/lib/ganglia/modload.so" 
  } 
  module { 
    name = "mem_module" 
    path = "/usr/lib/ganglia/modmem.so" 
  } 
  module { 
    name = "net_module" 
    path = "/usr/lib/ganglia/modnet.so" 
  } 
  module { 
    name = "proc_module" 
    path = "/usr/lib/ganglia/modproc.so" 
  } 
  module { 
    name = "sys_module" 
    path = "/usr/lib/ganglia/modsys.so" 
  } 
} 

include ('/etc/ganglia/conf.d/*.conf') 


/* The old internal 2.5.x metric array has been replaced by the following 
   collection_group directives.  What follows is the default behavior for 
   collecting and sending metrics that is as close to 2.5.x behavior as 
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every 
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses 
   the age of the running gmond. */ 
collection_group { 
  collect_once = yes 
  time_threshold = 20 
  metric { 
    name = "heartbeat" 
  } 
} 

/* This collection group will send general info about this host every 1200 
secs. 
   This information doesn't change between reboots and is only collected once. 
*/ 
collection_group { 
  collect_once = yes 
  time_threshold = 1200 
  metric { 
    name = "cpu_num" 
    title = "CPU Count" 
  } 
  metric { 
    name = "cpu_speed" 
    title = "CPU Speed" 
  } 
  metric { 
    name = "mem_total" 
    title = "Memory Total" 
  } 
  /* Should this be here? Swap can be added/removed between reboots. */ 
  metric { 
    name = "swap_total" 
    title = "Swap Space Total" 
  } 
  metric { 
    name = "boottime" 
    title = "Last Boot Time" 
  } 
  metric { 
    name = "machine_type" 
    title = "Machine Type" 
  } 
  metric { 
    name = "os_name" 
    title = "Operating System" 
  } 
  metric { 
    name = "os_release" 
    title = "Operating System Release" 
  } 
  metric { 
    name = "location" 
    title = "Location" 
  } 
} 

/* This collection group will send the status of gexecd for this host every 300 
secs */
/* Unlike 2.5.x the default behavior is to report gexecd OFF.  */ 
collection_group { 
  collect_once = yes 
  time_threshold = 300 
  metric { 
    name = "gexec" 
    title = "Gexec Status" 
  } 
} 

/* This collection group will collect the CPU status info every 20 secs. 
   The time threshold is set to 90 seconds.  In honesty, this time_threshold 
could be 
   set significantly higher to reduce unneccessary network chatter. */ 
collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* CPU status */ 
  metric { 
    name = "cpu_user"  
    value_threshold = "1.0" 
    title = "CPU User" 
  } 
  metric { 
    name = "cpu_system"   
    value_threshold = "1.0" 
    title = "CPU System" 
  } 
  metric { 
    name = "cpu_idle"  
    value_threshold = "5.0" 
    title = "CPU Idle" 
  } 
  metric { 
    name = "cpu_nice"  
    value_threshold = "1.0" 
    title = "CPU Nice" 
  } 
  metric { 
    name = "cpu_aidle" 
    value_threshold = "5.0" 
    title = "CPU aidle" 
  } 
  metric { 
    name = "cpu_wio" 
    value_threshold = "1.0" 
    title = "CPU wio" 
  } 
  /* The next two metrics are optional if you want more detail... 
     ... since they are accounted for in cpu_system.  
  metric { 
    name = "cpu_intr" 
    value_threshold = "1.0" 
    title = "CPU intr" 
  } 
  metric { 
    name = "cpu_sintr" 
    value_threshold = "1.0" 
    title = "CPU sintr" 
  } 
  */ 
} 

collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* Load Averages */ 
  metric { 
    name = "load_one" 
    value_threshold = "1.0" 
    title = "One Minute Load Average" 
  } 
  metric { 
    name = "load_five" 
    value_threshold = "1.0" 
    title = "Five Minute Load Average" 
  } 
  metric { 
    name = "load_fifteen" 
    value_threshold = "1.0" 
    title = "Fifteen Minute Load Average" 
  }
} 

/* This group collects the number of running and total processes */ 
collection_group { 
  collect_every = 80 
  time_threshold = 950 
  metric { 
    name = "proc_run" 
    value_threshold = "1.0" 
    title = "Total Running Processes" 
  } 
  metric { 
    name = "proc_total" 
    value_threshold = "1.0" 
    title = "Total Processes" 
  } 
}

/* This collection group grabs the volatile memory metrics every 40 secs and 
   sends them at least every 180 secs.  This time_threshold can be increased 
   significantly to reduce unneeded network traffic. */ 
collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "mem_free" 
    value_threshold = "1024.0" 
    title = "Free Memory" 
  } 
  metric { 
    name = "mem_shared" 
    value_threshold = "1024.0" 
    title = "Shared Memory" 
  } 
  metric { 
    name = "mem_buffers" 
    value_threshold = "1024.0" 
    title = "Memory Buffers" 
  } 
  metric { 
    name = "mem_cached" 
    value_threshold = "1024.0" 
    title = "Cached Memory" 
  } 
  metric { 
    name = "swap_free" 
    value_threshold = "1024.0" 
    title = "Free Swap Space" 
  } 
} 

collection_group { 
  collect_every = 40 
  time_threshold = 300 
  metric { 
    name = "bytes_out" 
    value_threshold = 4096 
    title = "Bytes Sent" 
  } 
  metric { 
    name = "bytes_in" 
    value_threshold = 4096 
    title = "Bytes Received" 
  } 
  metric { 
    name = "pkts_in" 
    value_threshold = 256 
    title = "Packets Received" 
  } 
  metric { 
    name = "pkts_out" 
    value_threshold = 256 
    title = "Packets Sent" 
  } 
}

/* Different than 2.5.x default since the old config made no sense */ 
collection_group { 
  collect_every = 1800 
  time_threshold = 3600 
  metric { 
    name = "disk_total" 
    value_threshold = 1.0 
    title = "Total Disk Space" 
  } 
}

collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "disk_free" 
    value_threshold = 1.0 
    title = "Disk Space Available" 
  } 
  metric { 
    name = "part_max_used" 
    value_threshold = 1.0 
    title = "Maximum Disk Space Used" 
  } 
}

--- End Message ---

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev

_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmetad only reads from one node of each data_source

Joan Marc Riera Duocastella

Barcelona Media - Centre d'Innovació Av. Diagonal, 177, planta 9 08018 - BARCELONA Telèfon +34 93 238 14 00 Fax +34 93 309 31 88 www.barcelonamedia.org

Joan Marc Riera Duocastella

Barcelona Media - Centre d'Innovació Av. Diagonal, 177, planta 9 08018 - BARCELONA Telèfon +34 93 238 14 00 Fax +34 93 309 31 88 www.barcelonamedia.org

Reply via email to

Barcelona Media - Centre d'Innovació
Av. Diagonal, 177, planta 9 08018 - BARCELONA
Telèfon +34 93 238 14 00 Fax +34 93 309 31 88
www.barcelonamedia.org

Barcelona Media - Centre d'Innovació
Av. Diagonal, 177, planta 9 08018 - BARCELONA
Telèfon +34 93 238 14 00 Fax +34 93 309 31 88
www.barcelonamedia.org