Re: [Ganglia-general] gmond not multicasting to other nodes

David Robson Thu, 26 Jan 2006 00:44:52 -0800

This problem arose with gmond built from source.  I therefore tried
installing it from the rpms on other nodes.  This worked fine.


In fact the nodes with gmond built from source started to receive messages
from the gmonds installed by rpm.

However the gmonds built by source just don't seem to send messages.

I built them with the following commands ...


./configure --with-gmetad --prefix=/usr/local/depot/ganglia-3.0.2
make
make install

This is no longer a problem for me, as I can use the rpm installed gmon
However it is interesting that my source built version didn't work fully.



Dave


David Robson wrote:

Hi,

I have just installed ganglia 3.0.2 on a cluster with Fedora Core 2.
I built it OK from the source.

For my default gmon.conf file, I used the output of

gmond -t

and the updated the cluster name field.

The rest seems set up to use multicasting to share
metrics between gmond processes on different nodes.

The trouble is that it doesn't.  Telnetting to the xml
port only downloads info for the local host.  Ditto
when using gmtetad.

It looks like the gmond processes aren't sharing their metrics.

Anyone have any idea on what I am doing wrong ??

netstat shows the following established socket
udp 0 0 jac-20:38020 239.2.11.71:8649ESTABLISHED 3355/gmond
Any help gratefully accepted.  I have attached my gmond.conf file

Dave Robson


------------------------------------------------------------------------
/* This configuration is as close to 2.5.x default behavior as possibleThe values closely match ./gmond/metric.h definitions in 2.5.x */globals {setuid = yesuser = nobodycleanup_threshold = 300 /*secs */debug_level=200}/* If a cluster attribute is specified, then all gmond hosts are wrapped inside* of a <CLUSTER> tag. If you do not specify a cluster tag, then all <HOSTS> will* NOT be wrapped inside of a <CLUSTER> tag. */cluster {name = "My Linux Cluster"}/* The host section describes attributes of the host, like the location */host {location = "unspecified"}/* Feel free to specify as many udp_send_channels as you like. Gmondused to only support having a single channel */udp_send_channel {mcast_join = 239.2.11.71port = 8649
   mcast_if= eth0
}/* You can specify as many udp_recv_channels as you like as well. */udp_recv_channel {mcast_join = 239.2.11.71port = 8649bind = 239.2.11.71mcast_if= eth0}/* You can specify as many tcp_accept_channels as you like to sharean xml description of the state of the cluster */tcp_accept_channel {port = 8649}
/* The old internal 2.5.x metric array has been replaced by the followingcollection_group directives. What follows is the default behavior forcollecting and sending metrics that is as close to 2.5.x behavior aspossible. */
/* This collection group will cause a heartbeat (or beacon) to be sent every20 seconds. In the heartbeat is the GMOND_STARTED data which expressesthe age of the running gmond. */collection_group {collect_once = yestime_threshold = 20metric {name = "heartbeat"}}/* This collection group will send general info about this host every 1200 secs.This information doesn't change between reboots and is only collected once. */collection_group {collect_once = yestime_threshold = 1200metric {name = "cpu_num"}metric {name = "cpu_speed"}metric {name = "mem_total"}/* Should this be here? Swap can be added/removed between reboots. */metric {name = "swap_total"}metric {name = "boottime"}metric {name = "machine_type"}metric {name = "os_name"}metric {name = "os_release"}metric {name = "location"}}
/* This collection group will send the status of gexecd for this host every 300 
secs */
/* Unlike 2.5.x the default behavior is to report gexecd OFF. */collection_group {collect_once = yestime_threshold = 300metric {name = "gexec"}}/* This collection group will collect the CPU status info every 20 secs.The time threshold is set to 90 seconds. In honesty, this time_threshold could beset significantly higher to reduce unneccessary network chatter. */collection_group {collect_every = 20time_threshold = 90/* CPU status */metric {name = "cpu_user"value_threshold = "1.0"}metric {name = "cpu_system"value_threshold = "1.0"}metric {name = "cpu_idle"value_threshold = "5.0"}metric {name = "cpu_nice"value_threshold = "1.0"}metric {name = "cpu_aidle"value_threshold = "5.0"}metric {name = "cpu_wio"value_threshold = "1.0"}/* The next two metrics are optional if you want more detail...... since they are accounted for in cpu_system.metric {name = "cpu_intr"value_threshold = "1.0"}metric {name = "cpu_sintr"value_threshold = "1.0"}*/}collection_group {collect_every = 20time_threshold = 90/* Load Averages */metric {name = "load_one"value_threshold = "1.0"}metric {name = "load_five"value_threshold = "1.0"}metric {name = "load_fifteen"value_threshold = "1.0"}}/* This group collects the number of running and total processes */collection_group {collect_every = 80time_threshold = 950metric {name = "proc_run"value_threshold = "1.0"}metric {name = "proc_total"value_threshold = "1.0"}}
/* This collection group grabs the volatile memory metrics every 40 secs andsends them at least every 180 secs. This time_threshold can be increasedsignificantly to reduce unneeded network traffic. */collection_group {collect_every = 40time_threshold = 180metric {name = "mem_free"value_threshold = "1024.0"}metric {name = "mem_shared"value_threshold = "1024.0"}metric {name = "mem_buffers"value_threshold = "1024.0"}metric {name = "mem_cached"value_threshold = "1024.0"}metric {name = "swap_free"value_threshold = "1024.0"}}collection_group {collect_every = 40time_threshold = 300metric {name = "bytes_out"value_threshold = 4096}metric {name = "bytes_in"value_threshold = 4096}metric {name = "pkts_in"value_threshold = 256}metric {name = "pkts_out"value_threshold = 256}}
/* Different than 2.5.x default since the old config made no sense */collection_group {collect_every = 1800time_threshold = 3600metric {name = "disk_total"value_threshold = 1.0}}
collection_group {collect_every = 40time_threshold = 180metric {name = "disk_free"value_threshold = 1.0}metric {name = "part_max_used"value_threshold = 1.0}}

Re: [Ganglia-general] gmond not multicasting to other nodes

Reply via email to