Re: [Ganglia-general] newbie install of 3.1.7
On Tue, Jun 22, 2010 at 11:16:54AM -0700, Deb Heller-Evans wrote: In our set up, I am configuring gmond for unicast communication, and have set up gmond.conf on the nodes to have the following: 52 udp_recv_channel { -- 53 host = 198.129.76.131 54 port = 8649 55 } 56 BUT, when starting gmond on the node, gmond complains: [108#] service gmond start Starting GANGLIA gmond: /etc/ganglia/gmond.conf:53: no such option 'host' Parse error for '/etc/ganglia/gmond.conf' [FAILED] I'm a little puzzled by this. Could someone point me in the right direction? man gmond.conf would show you there is no host option for udp_recv_channel but probably the option you are looking at is bind which will tell ganglia to bind to a specific IP for the unicast listener. Carlo -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia + Windows - Compilation Problems?
On Wed, Jun 23, 2010 at 01:40:52PM -0500, Douglas Wagner wrote: So I build libconfuse on Cygwin on my local XP development box and it gets stuck into /usr/local/* (lib, include, etc.). is it libconfuse 2.7 compiled as an static library and no nls support as suggested in README.WIN? is this using cygwin 1.5 on 32bit windows or are you using 1.7? Come back around (according to the README.WIN and tell ganglia to compile --with-libconfuse=/usr/local and it blows up telling me it can't find libconfuse. config.log would explain why, but hope it is not that you are trying to build it for 64bit windows. Linking everything into /usr/lib doesn't help either. I've seen docs on this but assumed it was supposed to be fixed in 3.1.2. not sure what you are referring here, but are you trying to build 3.1.7? noticed the README.WIN documents are not mentioning the need to override sysconfdir (which is irrelevant for cygwin anyway) and were not completely updated when the libpcre dependency was added (which also changed name recently in cygwin) for that release but used to work at least with 3.1.4 from what I remember and therefore probably also for 3.1.2. the following seemed to work for me on an updated windows vista laptop I had access with and with the latest cygwin (mostly using instructions from README.WIN and against the recommendation of sticking with 1.5, which will therefore require some patching) : $ tar -xvzf confuse-2.7.tar.gz $ cd confuse-2.7 $ ./configure --disable-nls $ make $ make install $ cd .. $ tar -xvzf ganglia-3.1.7.tar.gz $ cd ganglia-3.1.7 $ find . -type f -name *.h -a ! -name config.h -exec fgrep -l rpc/rpc.h {} \; | xargs -n1 perl -pi -e s;#include rpc/rpc.h;#include cygwin/in.h\n#include rpc/rpc.h;g $ ./configure GANGLIA_ACK_SYSCONFDIR=1 --with-libconfuse=/usr/local --enable-static-build $ make $ cd .. $ mkdir dist $ cp -a ganglia-3.1.7/gmond/gmond.exe dist/ $ cp -a ganglia-3.1.7/gmetric/gmetric.exe dist/ $ cp -a ganglia-3.1.7/gstat/gstat.exe dist/ $ cd confuse $ make uninstall $ cd .. $ rm -rf confuse* ganglia* the binaries in dist will need to be installed in the other nodes probably including the corresponding cygwin dll that they were built with if cygwin won't be installed independently (cygwin1.dll, cygapr-1-0.dll, cygexpat-1.dll, cygpcre-0.dll, and libpython2.6.dll). the following dependencies were installed as prerequisites on the system that was used for building this package (listed with `cygcheck.exe -c -d`) : diffutils2.9-1 expat2.0.1-1 libexpat12.0.1-1 libexpat1-devel 2.0.1-1 gcc 3.4.4-999 gcc-core 3.4.4-999 gcc-g++ 3.4.4-999 gcc-mingw-core 20050522-1 gcc-mingw-g++20050522-1 libgcc1 4.3.4-3 libapr1 1.4.2-1 libapr1-devel1.4.2-1 make 3.81-2 libpcre-devel8.02-1 libpcre0 8.02-1 python 2.6.5-2 sharutils4.8-1 sunrpc 4.0-3 Carlo -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia monitors does not work for some of the clusters
On Thu, Jun 24, 2010 at 07:37:25AM +0200, Raimund Eimann wrote: I have exactly the same issue with version 3.1.7. When I restart gmond on the affected nodes, their graphs work for some time (1-2 days typically). I use CentOS 5.{4,5} on my nodes. Usually the problem does not affect a cluster as a whole, but only a large number of nodes in the cluster (for insance, for 14 out of 17 nodes nothing gets displayed). are you using multicast or unicast? does setting send_metadata_interval to 60 or some other non zero value help? Carlo -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Gmond udp_send_channel using the wrong network (seems hostname related)
On Thu, Jun 24, 2010 at 10:21:53AM +, Ronny wrote: I am facing the problem, that my gmond udp_send_channels sends via the wrong network interface on a multi homed linux machine. there is some information on multihomed setups in the README which could help. The machines have a front NIC and an backend NIC. Both IPs from the NICs get resolved by the name service, but the primary IP's dns name is the system's hostname (with an IP address out of 62.48.x.x) In my clients gmond.conf I have set: udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. host = 10.0.11.16 port = 8649 ttl = 1 } whereby 10.0.11.16 is the backend network. But this gmond seems to ignore to use 10.0.11.16 and sends via the primary IP adress 62.48.x.x to the udp_receive_channel locatet on another host. A firewall between send_channel and receiver channel machines using 62.48.x.x is blocking that traffic. I can't currently open the firewall. that is what bind_hostname is meant to do AFAIK, maybe you would like to use instead bind = 10.0.11.16 (host should point to your collector if using unicast, so host and bind should be most of the time different ips in 10.0.11.x unlike this example) Carlo -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Gmond udp_send_channel using the wrong network (seems hostname related)
Sounds to me like your routing is not properly set although apparently that can depend on an OS. More than 4 years ago I reported a bug regarding gmond not honoring mcast_If setting http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=94 We resolved it by adding a route. It would seem that in unicast mode this should require no changes. Can you send us what your routing table looks like ? U Čet, 24. 06. 2010., u 10:21 +, Ronny je napisao/la: I am facing the problem, that my gmond udp_send_channels sends via the wrong network interface on a multi homed linux machine. The machines have a front NIC and an backend NIC. Both IPs from the NICs get resolved by the name service, but the primary IP's dns name is the system's hostname (with an IP address out of 62.48.x.x) In my clients gmond.conf I have set: udp_send_channel { bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. host = 10.0.11.16 port = 8649 ttl = 1 } whereby 10.0.11.16 is the backend network. But this gmond seems to ignore to use 10.0.11.16 and sends via the primary IP adress 62.48.x.x to the udp_receive_channel locatet on another host. A firewall between send_channel and receiver channel machines using 62.48.x.x is blocking that traffic. I can't currently open the firewall. What should I do to let gmond communicate exclusively via the 10.0.11.x network? I am running ganglia 3.1.7. -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Gmond udp_send_channel using the wrong network (seems hostname related)
On Sat, Jun 26, 2010 at 03:29:17PM -0400, Vladimir Vuksan wrote: More than 4 years ago I reported a bug regarding gmond not honoring mcast_If setting http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=94 mcast_if should be working fine in 3.0 since 3.0.5, could you confirm that? now you should be able to force multicast traffic to go through a specific interface if adding mcast_if into the corresponding udp_send_channel setting. it was broken again though in 3.1 and while it was fixed again for 3.1.2 as shown by BUG140 you would need 3.1.7 for a full fix and set of directives that are meant to help control all parts of functionality including also the IP that would be used as the source (which is what bind and bind_hostname are for) independently of the interface or IPv4 routing. We resolved it by adding a route. It would seem that in unicast mode this should require no changes. Can you send us what your routing table looks like ? unicast could use a different IP as the source if instructed to do so by explicitally binding to it or to the resolvable hostname as it seemed by the original reported configuration. agree though documentation is a little thin around of all it (there is also some complementary explanation in the README) specially with 3.1.7 which has now several overriding settings that affect this (routing, mcast_if, and bind/bind_hostname) Carlo -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general