[Ganglia-general] Nfs
Sorry if this is a dead horse, But how can I get stats on nfs mounts? I tried to edit the local mounts line to include nfs but I still don't see it.. I'm running RHEL and ganglia 3.7.1 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Python modules, NVIDIA, modpython.conf
Hi I'm having some issues in configuring the python modules for Ganglia on an Ubuntu 14.04 box. It has the standard install of gmond (3.5.0) from packages as well as the additional modules and python modules from packages. Problem is, the additional modules work but it doesn't look like the python modules work. These would be the ones downloaded from https://github.com/ganglia/gmond_python_modules/ I've included the following in gmond.conf and tried it with and without: module { name = python_module path = /usr/lib/ganglia/modpython.so params = /usr/lib/ganglia/python_modules/ } In particular, trying to include mod_python.conf anywhere (conf.d) causes an error 'no such option 'param''. I've amended paths where necessary to match the directory in which the python module files are created: /usr/lib/ganglia/python_modules. It works on a 12.04 box set up by my predecessor. I'm wondering if there was something that needed to be fixed that wasn't documented or whether it's version-specific. Any help anyone can provide would be much appreciated. I've used Ganglia several times in the past and I can't really imagine a cluster without it. Cheers Mike -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] cygwin 1.7 + ganglia 3.5
Dear All I am trying to compile ganglia 3.5 on cygwin 1.7 and facing the same issues as mentioned in the github https://github.com/ganglia/monitor-core/issues/96 just wondering is there any solution for this? thanks for any suggestion!! Best Regard! Mike -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] compiling on cygwin 1.7
Dear All trying to compile ganglia 3.5 or 3.6 on Cygwin but got the following error. I have seen previous thread and follow the procedure and no luck. thanks for any suggestion! cygwin 1.7.25 libexpat-devel 2.1.0-3 libexpat1 2.1.0-3 libapr1 1.4.8-1 libapr1-devel 1.4.8-1 gcc-core4.7.3-1 gcc-g++ 4.7.3-1 libgcc1 4.7.3-1 libpcre-devel 8.33-1 pkg-config 0.23b-10 python 2.7.3-1 sunrpc 4.0-3 libpcre-devel 8.33-1 libpcre18.33-1 have compile confuse-2.7 and install it ./configure --disable-nls make make install and install try to compile ganglia ./configure --with-libconfuse=/usr/local --without-libpcre --enable-static-build then got the following error when compiling libmetric make[4]: Entering directory `/home/mike/ganglia-3.6.0/libmetrics/cygwin' /bin/sh ../libtool --tag=CC--mode=compile gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -I.. -I../../lib -I../../include -g -O2 -Wall -MT metrics.lo -MD -MP -MF .deps/metrics.Tpo -c -o metrics.lo metrics.c libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -I.. -I../../lib -I../../include -g -O2 -Wall -MT metrics.lo -MD -MP -MF .deps/metrics.Tpo -c metrics.c -DDLL_EXPORT -DPIC -o .libs/metrics.o In file included from /usr/include/cygwin/in.h:267:0, from /usr/include/netinet/in.h:14, from ../unpifi.h:22, from ../interface.h:10, from metrics.c:33: /usr/include/cygwin/in6.h:75:8: error: redefinition of ‘struct in6_addr’ In file included from /usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/mprapi.h:10:0, from /usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/iprtrmib.h:9, from /usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/iphlpapi.h:13, from metrics.c:18: /usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/ras.h:19:16: note: originally defined here metrics.c: In function ‘proc_run_func’: metrics.c:720:46: warning: variable ‘cProcesses’ set but not used [-Wunused-but-set-variable] Makefile:259: recipe for target `metrics.lo' failed make[4]: *** [metrics.lo] Error 1 make[4]: Leaving directory `/home/mike/ganglia-3.6.0/libmetrics/cygwin' Makefile:361: recipe for target `all-recursive' failed make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/mike/ganglia-3.6.0/libmetrics' Makefile:247: recipe for target `all' failed make[2]: *** [all] Error 2 make[2]: Leaving directory `/home/mike/ganglia-3.6.0/libmetrics' Makefile:370: recipe for target `all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/mike/ganglia-3.6.0' Makefile:287: recipe for target `all' failed make: *** [all] Error 2 Many thanks for any suggestion! Best Regard! Mike -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] custom python module graphs update values only on gmond restart
The root of the problem was in my python, once I moved the connection creation statement into the metric handler method the graphs updated as expected. On Wed, Dec 7, 2011 at 11:32 AM, Mike Broers mbro...@gmail.com wrote: I created a python module to graph the results of a postgres query. When I evoke the python program manually by calling python postgres.py I get the results I expect (they change). When I put the module and pyconf into the ganglia folders and restart gmond I get a graph that stays constant until I restart gmond again. Whenever I restart gmond, the values get updated and the graphs change, but then remain constant until I restart gmond again. Here are the .py and .pyconf files, I'm unclear if there is a conf or update that needs to take place elsewhere to get these new python module metrics to start collecting based on the interval, perhaps on the gmetad side? I have the collect_every = 10 so I would assume it knows to collect more than once.. # #postgres.py # import psycopg2 #set up postgres connection pgdsn= dbname=qa host=localhost user=postgres port=6543 password= db_conn = psycopg2.connect(pgdsn) def pg_active(name): pg_active_sql = select count(*)::integer as count from pg_stat_activity where current_query 'IDLE' and current_query 'IDLE in transaction' db_curs = db_conn.cursor() db_curs.execute(pg_active_sql) pg_active_sql_results = db_curs.fetchall() (,count_active) = pg_active_sql_results[0] pg_active_count= int(count_active) - 1 return pg_active_count db_curs.close() db_conn.close() def metric_init(params): global descriptors d3 = {'name': 'Pypg_active_sessions', 'call_back': pg_active, 'time_max': 90, 'value_type': 'uint', 'units': 'Sessions', 'slope': 'both', 'format': '%u', 'description': 'PG Active Sessions', 'groups': 'Postgres'} descriptors = [d3] return descriptors def metric_cleanup(): '''Clean up the metric module.''' pass #This code is for debugging and unit testing if __name__ == '__main__': metric_init({}) for d in descriptors: v = d['call_back'](d['name']) print 'value for %s is %u' % (d['name'], v) # #postgres.pyconf # modules { module { name = postgres language = python } } collection_group { collect_every = 10 time_threshold = 50 metric { name = Pypg_active_sessions title = Postgres Active Sessions value_threshold = 1 } } Thanks for reviewing! Mike -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmetad polling another gmetad data source broken in 3.2.0?
Mark Wagner mwagner at intelius.com writes: This is the patch I ended up using: diff -urN ganglia-3.2.0.dist/gmetad/process_xml.c ganglia-3.2.0/gmetad/process_xml.c --- ganglia-3.2.0.dist/gmetad/process_xml.c 2011-07-07 08:44:35.0 -0700 +++ ganglia-3.2.0/gmetad/process_xml.c 2011-10-21 15:18:31.0 -0700 @@ -1172,6 +1172,7 @@ { case GRID_TAG: rc = endElement_GRID(data, el); +rc = endElement_CLUSTER(data, el); break; case CLUSTER_TAG: Seems to be working quite nicely so far. (though much work remains on my side) Thanks for the info quick response. Has this already been accepted as a patch upstream? Anything we can do to ensure others don't run into this issue will surely be appreciated. thanks again, -- MikeE -- RSA#174; Conference 2012 Save $700 by Nov 18 Register now#33; http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia Installation Issues
Hi Antonio, I could finally see the Ganglia up running in the webinterface. I restarted everything and now its fine. Thanks a lot for your help. Now I am looking for monitoring hadoop using Ganglia. I added the metrics properties to the hadoop-metrics properties. Is there something else I have to do to see the hadoop metrics in ganglia? Thanks, Mike --- On Tue, 12/7/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com wrote: From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com Subject: Re: [Ganglia-general] Ganglia Installation Issues To: Mike nano_kol...@yahoo.com Cc: Ganglia ganglia-general@lists.sourceforge.net Date: Tuesday, December 7, 2010, 11:37 AM 2010/12/6 Mike nano_kol...@yahoo.com Yes, I have the web folder copied to /var/www/ganglia Do we have to keep in gmond.conf, tcp_accept_channel { port = 8649 } Because trying to start gmond with this included in the conf gave me an error Unable to create tcp_accept_channel. So I removed this from gmond.conf Thanks, Mike --- On Mon, 12/6/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com wrote: From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com Subject: Re: [Ganglia-general] Ganglia Installation Issues To: Mike nano_kol...@yahoo.com Cc: Ganglia ganglia-general@lists.sourceforge.net Date: Monday, December 6, 2010, 9:07 PM 2010/12/6 Mike nano_kol...@yahoo.com Hi Antonio, Thanks much for your response. I now ran /usr/sbin/update-rc.d -f gmond defaults and /usr/sbin/update-rc.d -f gmetad defaults, which initially gave me error: update-rc.d: /etc/init.d/gmond: file does not exist. The init script was in /etc/rc.d/init.d/. Then I copied it to /etc/init.d/gmond. Now the update-rc.d went fine . I cannot view the web interface when I point to http://ip_address/ganglia/ and I get The server at ip_address is taking too long to respond. Here are some relevant information: A) I start gmond by /usr/sbin/gmond when I do a telnet EC2Ip_address 8649, it gives HOST NAME=EC2Ip_address IP=10.251.86.192 REPORTED=1291666741 TN=30 TMAX=20 DMAX=0 LOCATION=EC2Ip_address GMOND_STARTED=129121 B) /usr/sbin/gmond -d 10 gives me this: Got a heartbeat message 1291666768 metric 'cpu_user' being collected now metric 'cpu_user' has value_threshold 1.00 metric 'cpu_system' being collected now metric 'cpu_system' has value_threshold 1.00 metric 'cpu_idle' being collected now metric 'cpu_idle' has value_threshold 5.00 metric 'cpu_nice' being collected now metric 'cpu_nice' has value_threshold 1.00 metric 'cpu_aidle' being collected now metric 'cpu_aidle' has value_threshold 5.00 metric 'cpu_wio' being collected now metric 'cpu_wio' has value_threshold 1.00 metric 'load_one' being collected now metric 'load_one' has value_threshold 1.00 metric 'load_five' being collected now metric 'load_five' has value_threshold 1.00 metric 'load_fifteen' being collected now metric 'load_fifteen' has value_threshold 1.00 sent message 'heartbeat' of length 56 with 0 errors Processing a metric value message from EC2_IP Got a heartbeat message 1291667489 and goes on C) /usr/sbin/gmetad -d 10 gives me this Going to run as user nobody Sources are ... Source: [MyCluster, step 15] has 1 sources 10.251.86.192 xml listening on port 8651 interactive xml listening on port 8652 cleanup thread has been started Data thread 1147169104 is monitoring [MyCluster] data source 10.251.86.192 [MyCluster] is a 2.5 or later data stream hash_create size = 1024 hash-size is 1031 hash_create size = 50 hash-size is 53 hash_create size = 50 hash-size is 53 [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream ...etc D) Here are the relevant parts in my /etc/ganglia/gmond.conf cluster { name = MyCluster owner = myclusterowner latlong = unspecified url = unspecified } host { location = IP_of_EC2 } udp_send_channel { mcast_join = IP_of_EC2 port = 8666 ttl = 1 } udp_recv_channel { port = 8666 family = inet4 } And gmetad.conf has data_source MyCluster ipaddress:8649 Any help on this would be highly appreciated!. Thanks, Mike If I'm not wrong, this error is showed when gmond is already running. You have to keep this lines to get working the system. Try readd the lines, stopping gmond and restarting, it must work fine. One question: you can see the ganglia website in any case, isn't? Regards, Antonio
Re: [Ganglia-general] Ganglia Installation Issues
Hi Antonio, Thanks much for your response. I now ran /usr/sbin/update-rc.d -f gmond defaults and /usr/sbin/update-rc.d -f gmetad defaults, which initially gave me error: update-rc.d: /etc/init.d/gmond: file does not exist. The init script was in /etc/rc.d/init.d/. Then I copied it to /etc/init.d/gmond. Now the update-rc.d went fine . I cannot view the web interface when I point to http://ip_address/ganglia/ and I get The server at ip_address is taking too long to respond. Here are some relevant information: A) I start gmond by /usr/sbin/gmond when I do a telnet EC2Ip_address 8649, it gives HOST NAME=EC2Ip_address IP=10.251.86.192 REPORTED=1291666741 TN=30 TMAX=20 DMAX=0 LOCATION=EC2Ip_address GMOND_STARTED=129121 B) /usr/sbin/gmond -d 10 gives me this: Got a heartbeat message 1291666768 metric 'cpu_user' being collected now metric 'cpu_user' has value_threshold 1.00 metric 'cpu_system' being collected now metric 'cpu_system' has value_threshold 1.00 metric 'cpu_idle' being collected now metric 'cpu_idle' has value_threshold 5.00 metric 'cpu_nice' being collected now metric 'cpu_nice' has value_threshold 1.00 metric 'cpu_aidle' being collected now metric 'cpu_aidle' has value_threshold 5.00 metric 'cpu_wio' being collected now metric 'cpu_wio' has value_threshold 1.00 metric 'load_one' being collected now metric 'load_one' has value_threshold 1.00 metric 'load_five' being collected now metric 'load_five' has value_threshold 1.00 metric 'load_fifteen' being collected now metric 'load_fifteen' has value_threshold 1.00 sent message 'heartbeat' of length 56 with 0 errors Processing a metric value message from EC2_IP Got a heartbeat message 1291667489 and goes on C) /usr/sbin/gmetad -d 10 gives me this Going to run as user nobody Sources are ... Source: [MyCluster, step 15] has 1 sources 10.251.86.192 xml listening on port 8651 interactive xml listening on port 8652 cleanup thread has been started Data thread 1147169104 is monitoring [MyCluster] data source 10.251.86.192 [MyCluster] is a 2.5 or later data stream hash_create size = 1024 hash-size is 1031 hash_create size = 50 hash-size is 53 hash_create size = 50 hash-size is 53 [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream ...etc D) Here are the relevant parts in my /etc/ganglia/gmond.conf cluster { name = MyCluster owner = myclusterowner latlong = unspecified url = unspecified } host { location = IP_of_EC2 } udp_send_channel { mcast_join = IP_of_EC2 port = 8666 ttl = 1 } udp_recv_channel { port = 8666 family = inet4 } And gmetad.conf has data_source MyCluster ipaddress:8649 Any help on this would be highly appreciated!. Thanks, Mike --- On Sun, 12/5/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com wrote: From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com Subject: Re: [Ganglia-general] Ganglia Installation Issues To: Mike nano_kol...@yahoo.com Cc: Ganglia ganglia-general@lists.sourceforge.net Date: Sunday, December 5, 2010, 9:57 AM Hey, Mike, 2010/12/5 Mike nano_kol...@yahoo.com Hi all, I am trying to get Ganglia run on Ubuntu instance. I built the version 3.1.7 from source. I compiled the source and libs were installed in /etc/ganglia/lib64/ganglia/ I used the command: ./configure --prefix=/etc/ganglia --with-gmetad --sysconfdir=/etc/ganglia make make install , and everything went fine. Ichanged the GMOND in gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad 1. I have copied the gmond/gmond.init from the build directory to /etc/rc.d/init.d/gmond and when I start gmond using the command /etc/rc.d/init.d/gmond start I get the following error. .: 9: Can't open /etc/rc.d/init.d/functions Also I copied gmetad/gmetad.init to /etc/rc.d/init.d/gmetad and starting it also fails with the above error. What is expected for /etc/rc.d/init.d/functions ? by these scripts. When I try something like gmond -d 1 to start the gmond in foreground it gives a message that : [PYTHON] Can't open the python module path /etc/ganglia/lib64/ganglia/python_modules. Module python_module failed to initialize. In this case, you have to put gmond and gmetad in the startup applications. You can do this with: $ sudo updatedb-rc.d -f gmond defaults $ sudo updatedb-rc.d -f gmetad defaults The other thing can be solved creating this directory or comment the line that searches it in /etc/ganglia/gmond.conf. I'm not sure if it's necessary but the owner of this folder is 'nobody' in my
Re: [Ganglia-general] Ganglia Installation Issues
Yes, I have the web folder copied to /var/www/ganglia Do we have to keep in gmond.conf, tcp_accept_channel { port = 8649 } Because trying to start gmond with this included in the conf gave me an error Unable to create tcp_accept_channel. So I removed this from gmond.conf Thanks, Mike --- On Mon, 12/6/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com wrote: From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com Subject: Re: [Ganglia-general] Ganglia Installation Issues To: Mike nano_kol...@yahoo.com Cc: Ganglia ganglia-general@lists.sourceforge.net Date: Monday, December 6, 2010, 9:07 PM 2010/12/6 Mike nano_kol...@yahoo.com Hi Antonio, Thanks much for your response. I now ran /usr/sbin/update-rc.d -f gmond defaults and /usr/sbin/update-rc.d -f gmetad defaults, which initially gave me error: update-rc.d: /etc/init.d/gmond: file does not exist. The init script was in /etc/rc.d/init.d/. Then I copied it to /etc/init.d/gmond. Now the update-rc.d went fine . I cannot view the web interface when I point to http://ip_address/ganglia/ and I get The server at ip_address is taking too long to respond. Here are some relevant information: A) I start gmond by /usr/sbin/gmond when I do a telnet EC2Ip_address 8649, it gives HOST NAME=EC2Ip_address IP=10.251.86.192 REPORTED=1291666741 TN=30 TMAX=20 DMAX=0 LOCATION=EC2Ip_address GMOND_STARTED=129121 B) /usr/sbin/gmond -d 10 gives me this: Got a heartbeat message 1291666768 metric 'cpu_user' being collected now metric 'cpu_user' has value_threshold 1.00 metric 'cpu_system' being collected now metric 'cpu_system' has value_threshold 1.00 metric 'cpu_idle' being collected now metric 'cpu_idle' has value_threshold 5.00 metric 'cpu_nice' being collected now metric 'cpu_nice' has value_threshold 1.00 metric 'cpu_aidle' being collected now metric 'cpu_aidle' has value_threshold 5.00 metric 'cpu_wio' being collected now metric 'cpu_wio' has value_threshold 1.00 metric 'load_one' being collected now metric 'load_one' has value_threshold 1.00 metric 'load_five' being collected now metric 'load_five' has value_threshold 1.00 metric 'load_fifteen' being collected now metric 'load_fifteen' has value_threshold 1.00 sent message 'heartbeat' of length 56 with 0 errors Processing a metric value message from EC2_IP Got a heartbeat message 1291667489 and goes on C) /usr/sbin/gmetad -d 10 gives me this Going to run as user nobody Sources are ... Source: [MyCluster, step 15] has 1 sources 10.251.86.192 xml listening on port 8651 interactive xml listening on port 8652 cleanup thread has been started Data thread 1147169104 is monitoring [MyCluster] data source 10.251.86.192 [MyCluster] is a 2.5 or later data stream hash_create size = 1024 hash-size is 1031 hash_create size = 50 hash-size is 53 hash_create size = 50 hash-size is 53 [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream [MyCluster] is a 2.5 or later data stream ...etc D) Here are the relevant parts in my /etc/ganglia/gmond.conf cluster { name = MyCluster owner = myclusterowner latlong = unspecified url = unspecified } host { location = IP_of_EC2 } udp_send_channel { mcast_join = IP_of_EC2 port = 8666 ttl = 1 } udp_recv_channel { port = 8666 family = inet4 } And gmetad.conf has data_source MyCluster ipaddress:8649 Any help on this would be highly appreciated!. Thanks, Mike How is it going? That's weird. Did you copy the files ganglia-X.YY/web in /var/www? Because it seems that gmond gmetad are working fine... Antonio. -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Ganglia Installation Issues
Hi all, I am trying to get Ganglia run on Ubuntu instance. I built the version 3.1.7 from source. I compiled the source and libs were installed in /etc/ganglia/lib64/ganglia/ I used the command: ./configure --prefix=/etc/ganglia --with-gmetad --sysconfdir=/etc/ganglia make make install , and everything went fine. Ichanged the GMOND in gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad 1. I have copied the gmond/gmond.init from the build directory to /etc/rc.d/init.d/gmond and when I start gmond using the command /etc/rc.d/init.d/gmond start I get the following error. .: 9: Can't open /etc/rc.d/init.d/functions Also I copied gmetad/gmetad.init to /etc/rc.d/init.d/gmetad and starting it also fails with the above error. What is expected for /etc/rc.d/init.d/functions ? by these scripts. When I try something like gmond -d 1 to start the gmond in foreground it gives a message that : [PYTHON] Can't open the python module path /etc/ganglia/lib64/ganglia/python_modules. Module python_module failed to initialize. 2. Also I am running an EC2 instance, so while making the changes in the conf files, In the gmetad.conf I made foll changes: a. data_source MyCluster 'internalIP of the instance' ( or should I add external IP of the instance?) b. What should I set for User gmetad will setuid to (defaults to nobody). My rrd directory is at /var/lib/ganglia/rrds and is owned by root.So should I set the user here as root? In the gmond.conf I have the foll: cluster { name = MyCluster owner = MyOwner latlong = unspecified url = unspecified} host { location = unspecified} (again what should go in here???) udp_send_channel { mcast_join = 239.2.11.71 port = 8649 ttl = 1} udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71} tcp_accept_channel { port = 8649} All other conf paremeters are unchanged. Please help me with this. Thanks, Mike -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Ganglia Installation issues!URGENT
Hi all, I am trying to get Ganglia run on Ubuntu instance. I built the version 3.1.7 from source. I compiled the source and libs were installed in /etc/ganglia/lib64/ganglia/ I used the command: ./configure --prefix=/etc/ganglia --with-gmetad --sysconfdir=/etc/ganglia make make install , and everything went fine. Ichanged the GMOND in gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad 1. I have copied the gmond/gmond.init from the build directory to /etc/rc.d/init.d/gmond and when I start gmond using the command /etc/rc.d/init.d/gmond start I get the following error. .: 9: Can't open /etc/rc.d/init.d/functions Also I copied gmetad/gmetad.init to /etc/rc.d/init.d/gmetad and starting it also fails with the above error. What is expected for /etc/rc.d/init.d/functions ? by these scripts. When I try something like gmond -d 1 to start the gmond in foreground it gives a message that : [PYTHON] Can't open the python module path /etc/ganglia/lib64/ganglia/python_modules. Module python_module failed to initialize. 2. Also I am running an EC2 instance, so while making the changes in the conf files, In the gmetad.conf I made foll changes: a. data_source MyCluster 'internalIP of the instance' ( or should I add external IP of the instance?) b. What should I set for User gmetad will setuid to (defaults to nobody). My rrd directory is at /var/lib/ganglia/rrds and is owned by root.So should I set the user here as root? In the gmond.conf I have the foll: cluster { name = MyCluster owner = MyOwner latlong = unspecified url = unspecified} host { location = unspecified}(again what should go in here???) udp_send_channel { mcast_join = 239.2.11.71 port = 8649 ttl = 1} udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71} tcp_accept_channel { port = 8649} All other conf paremeters are unchanged. Please help me with this. Thanks, Mike -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Ganglia Installation issues!URGENT!!
Hi all, I am trying to get Ganglia run on Ubuntu instance. I built the version 3.1.7 from source. I compiled the source and libs were installed in /etc/ganglia/lib64/ganglia/ I used the command: ./configure --prefix=/etc/ganglia --with-gmetad --sysconfdir=/etc/ganglia make make install , and everything went fine. Ichanged the GMOND in gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad 1. I have copied the gmond/gmond.init from the build directory to /etc/rc.d/init.d/gmond and when I start gmond using the command /etc/rc.d/init.d/gmond start I get the following error. .: 9: Can't open /etc/rc.d/init.d/functions Also I copied gmetad/gmetad.init to /etc/rc.d/init.d/gmetad and starting it also fails with the above error. What is expected for /etc/rc.d/init.d/functions ? by these scripts. When I try something like gmond -d 1 to start the gmond in foreground it gives a message that : [PYTHON] Can't open the python module path /etc/ganglia/lib64/ganglia/python_modules. Module python_module failed to initialize. 2. Also I am running an EC2 instance, so while making the changes in the conf files, In the gmetad.conf I made foll changes: a. data_source MyCluster 'internalIP of the instance' ( or should I add external IP of the instance?) b. What should I set for User gmetad will setuid to (defaults to nobody). My rrd directory is at /var/lib/ganglia/rrds and is owned by root.So should I set the user here as root? In the gmond.conf I have the foll: cluster { name = MyCluster owner = MyOwner latlong = unspecified url = unspecified} host { location = unspecified}(again what should go in here???) udp_send_channel { mcast_join = 239.2.11.71 port = 8649 ttl = 1} udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71} tcp_accept_channel { port = 8649} All other conf paremeters are unchanged. Please help me with this. Thanks, Mike -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia installation
Thanks much Bernard. I shall try from scratch installing the latest version and get back to you if I am stuck. From: Bernard Li bern...@vanhpc.org To: Mike nano_kol...@yahoo.com Cc: Ganglia ganglia-general@lists.sourceforge.net Sent: Thu, October 14, 2010 7:23:36 PM Subject: Re: [Ganglia-general] ganglia installation Hi Mike: When responding, please make sure you reply-all so that replies are sent back to the list. This ensures that our discussions are archived for future users encountering the same issue, thanks! On Mon, Oct 11, 2010 at 2:53 PM, Mike nano_kol...@yahoo.com wrote: Thanks much for your response. I had tried installing the version 3.1 before but it didnt help.My instance is x86_64 and the ganglia README.txt says that Ganglia runs on Linux- i386, ia64, sparc, alpha, powerpc, m68k, mips,arm, hppa, s390t? So I was skeptical about that. Sorry I am new to Thanks for pointing that out. I have recently fixed this in our development branch (trunk) but forgot to backport it to our 3.0 and 3.1 trees. I've just merged the changes to the 3.1 tree so the documentation will be updated in the next 3.1 release: https://sourceforge.net/apps/trac/ganglia/changeset/2349 hadoop/ganglia and it would be really helpful if you can send me some tutorials where I can get the step by step process of installation. Installing Ganglia under EC2 is no different from any other environments, so any guide should suffice, for example this one: http://www.jansipke.nl/installing-ganglia-on-centos EC2-specific gotchas are mentioned here: http://www.cultofgary.com/2008/10/16/ec2-and-ganglia/ I have added a link to this page in our Wiki page as well: https://sourceforge.net/apps/trac/ganglia/wiki/ganglia_configuration Also my EC2 instance is not under any cluster name. So does it make sense to put some cluster name in gmetad.conf? Or the cluster name I put in here is independent of all those? It's mostly for your reference. What ends up being shown on the frontend is actually Cluster from gmond.conf. Also I installed ganglia-web RPM as: wget -c http://downloads.sourceforge.net/ganglia/ganglia-web-3.1.0-1.el4.noarch.rpm rpm -ivh ganglia-gmetad-3.1.0-1.el4.i386.rpm ganglia-web-3.1.0-1.el4.noarch.rp I can't remember if ganglia-web 3.1 is compatible with gmetad/gmond 2.5, but regardless, I think if you're just starting out, you should use 3.1.7 across the board. Also make sure that you have started apache/httpd so that it will start serving pages of the frontend. How do I check the apache error_logs? That is usually in /var/log/httpd. Cheers, Bernard -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] ganglia installation
Hi all, I am trying to install ganglia in a single EC2 instance (Fedora x86_64 GNU/Linux).I want to use ganglia to monitor hadoop performance. Hadoop installation is successful. I installed everything for ganglia(2.5.7) following the steps in the link in this same instance http://wiki.appnexus.com/display/documentation/Monitoring+Instances+Using+Ganglia but using x86_64 rpms instead of i386. I havent changed anything in gmetad.conf except for adding this: data_source unspecified localhost. I have the gmond.conf with all the default values, I have modified nothing in it. I also set hadoop-metrics.properties as explained in http://developer.yahoo.com/hadoop/tutorial/module7.html#ganglia with mapred.servers,dfs.servers,jvm.servers as localhost:8649 When I view the page http://hostnameofEC2instance/ganglia. It doesn't display anything, it waits for sometime and says page cannot be displayed as it takes long time to respond. I am able to start gmond,gmetad etc(service gmetad start). and can do telnet localhost 8649 to display the XML. Also when going on gmond --debug=9/gmetad --debug=9 it goes on displaying some messages. Can anyone help me with this,if I am going wrong somewhere? Thanks, Michael -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Adding a custom view - how?
All; I have a Ganglia server running 3.0.3. In my gmetad.conf, I have data sources defined that look like: data_source Development xx.yy.zz.aa:8650 data_source Production xx.yy.zz.bb:8650 This works great, but what I now need to do is provide some custom views for my management, so that they can see the graphs only for hosts assigned to a specific group ie: Development [host1 host2 host3 host4] on one page, and Development [host5 host6 host7 host8] on another, without disturbing the existing pages. What¹s the best way to accomplish this? - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] any ideas as to where to start with this error
I have web server and I am receiving this error There was an error collecting ganglia data (127.0.0.1:8652): XML error: SYSTEM or PUBLIC, the URI is missing at 1 What is the configuration file I need to change and what do I change. I checked in /var/www/ganglia/ganglia.php but line 1 is only the top of the file with no config options. Thanks for your help Mike - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Warning: fsockopen() [function.fsockopen]
I am currently having a problem displaying my cluster information in a web browser. The odd thing is that when my friend pulls my data from my netowork and displays it on his Apache server, I can see it in a browser. The message I receive when I try to display the info from my Apache web server is below: Warning: fsockopen() [function.fsockopen]: unable to connect to 127.0.0.1:8652 (Connection refused) in /var/www/ganglia/ganglia.php on line 283 There was an error collecting ganglia data (127.0.0.1:8652): fsockopen error: Connection refused Any suggestions would be greatly appreciated. Thanks, Mike - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] additional info about fsock open error
Just an FYI, I have the ports 8649 to 8652 forwarded on my router to my Apache web server. I have looked at the file on line 283 and I don't know what part of that line is creating the error. The line is below: $fp = fsockopen( $ip, $port, $errno, $errstr, $timeout); Thanks, Mike - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] fsockopen problem
Should I remove the webserver box as a data source from my gmond and gmetad config files. That box is also the ndb manager of my cluster. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] fsockopen problem
I disabled forwarding of ports 8649-8652 on my router. I checked the gmond.conf and gmetad.conf files and I am not forwarding those ports to the Apache server from those. The Apache server is also the gmetad server. I tried browsing to it, but I still receive the same error message. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] fsockopen problem
I tried to web to my external ip address, and my internal ip address of my web server and I'm getting the same error message. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] fsockopen problem
I've changed the line at 283 and now I get this error There was an error collecting ganglia data (127.0.0.1:8652): XML error: SYSTEM or PUBLIC, the URI is missing at 1 What next? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] [Earthlink] Re: Ganglia 3.0.5 final RC
Only speaking for what is happening on OSX. The original issue (before the patches): After reading all the data from the for(;;) loop, we would read a SYS_CALL buffer, determine that POLLUP was set and throw out the entire message when we set d-dead=1 and did a goto take_a_break; Thus we where not getting any indication of an error, just gmetad would not work correctly on OSX. With the RC release with the patch: A) As we go into the if (struct_poll.revents POLLIN) and do a SYS_CALL on 1023 bytes, we get back X bytes_read. B) Then doing a 'if' on POLLHUP, we find that POLLHUP is set and would normally just do a 'break' which would take us out of the for (;;) loop and attempt to process the XML data. However, with this patch we get XML parser errors, and thus throwing out the incomplete messages. (Warning occurs when running in debug mode, and still the gmetad not working correctly on OSX.) To test the theory: However, IF we do another SYS_CALL for another 1023 bytes AFTER the check for POLLHUP (and before the 'break;') there is an additional Y bytes read from the system socket buffer. Thus, most of the time, we never receive the entire message before we hit the POLLHUP break, and thus loose the entire message. I have only done code inspection of the OSX kernel (haven't compiled the kernel in debug), but it Appears to set POLLHUP, Not on the test when the application is done reading (as this 'if' statement represents) or a lost connection as suggested in the standard, but some other time way before we are done reading valid data off the socket buffer. Thus, at this point, I would not even attempt to test for POLLHUP on OSX at this point. Did that explain what we are seeing on OSX? Mike On Sep 18, 2007, at 7:47 AM, Brad Nicholes wrote: On 9/17/2007 at 9:23 PM, in message [EMAIL PROTECTED], Mike Walker [EMAIL PROTECTED] wrote: Bernard, No go. This doesn't have the patch that I sent to work the OSX issues in gmetad. It does have the suggestion by Brad, of putting an if statement in the read loop to test for the POLLUP. However, from the previous beta (3.0.5 on ~ Sept 10th) testing cycle and my email response back to the list after that beta, his suggestion doesn't work on OSX. The reason is that the KERNAL is done reading off the socket and sets the POLLUP flag BEFORE gmetad finishes reading the entire buffer. Thus, by breaking out of the read loop before the entire buffer is read, we get an incomplete message, and thus the messages are discarded by the XML parser. The discarded messages results in incorrect display in the ganglia PHP, by stating that machines are down, gaps in monitoring, etc. I am sure that you are correct, so help me understand what is going on here. From what I could get from Google searches, different platforms indicate an EOF in different ways. Some set just POLLIN and then indicate EOF by checking bytes_read == 0 after a read(). In this case an revents of POLLHUP only indicates a broken connection. However other platforms send a POLLIN | POLLHUP with the POLLHUP indicating the EOF. In this way an extra read() looking for byte_read==0 would be unnecessary. A final read() can be done and EOF determined all in the same operation. In the data_thread.c code as it was originally, a POLLIN with bytes_read==0 would have functioned as expected. But a POLLIN | POLLHUP with bytes_read==anything would have resulted in aborting the connection all together without processing any of the data that had already be read. By adding a check for POLLHUP within the POLLIN handling, aborting the connection is avoided and the data is processed normally. Are you saying that even if POLLIN | POLLHUP is received and all of the data is read from the socket, there is still more data on the socket and a subsequent read must still be done until bytes_read==0? I guess the Curl guy just decided to treat POLLIN == POLLHUP. Does that seem safe for all platforms? If my assumptions are incorrect, which it looks like they are, then it seems to me that going back to your original patch would be the best solution. Thoughts? Brad -- --- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia
Re: [Ganglia-general] Ganglia 3.0.5 final RC
Bernard, No go. This doesn't have the patch that I sent to work the OSX issues in gmetad. It does have the suggestion by Brad, of putting an if statement in the read loop to test for the POLLUP. However, from the previous beta (3.0.5 on ~ Sept 10th) testing cycle and my email response back to the list after that beta, his suggestion doesn't work on OSX. The reason is that the KERNAL is done reading off the socket and sets the POLLUP flag BEFORE gmetad finishes reading the entire buffer. Thus, by breaking out of the read loop before the entire buffer is read, we get an incomplete message, and thus the messages are discarded by the XML parser. The discarded messages results in incorrect display in the ganglia PHP, by stating that machines are down, gaps in monitoring, etc. Sorry. RC is a no go on OSX. Mike On Sep 17, 2007, at 2:55 PM, Bernard Li wrote: Dear all: This is absolutely the last RC for Ganglia 3.0.5 -- it has Brad Nicholes' fix for the Mac OSX issue so if folks who have access to Mac OSX (both x86 and ppc) please test this and report back success/failures, we can then make this the official release. As usual, the tarball and SRPM are available here: http://www.therealms.org/oss/ganglia/testing/ Thanks for your attention. Cheers, Bernard On 9/7/07, Bernard Li [EMAIL PROTECTED] wrote: Dear all: The final release candidate for Ganglia 3.0.5 is now available: http://therealms.org/oss/ganglia/testing/ i686 RPMs are built on Fedora Core 6 x86 ppc64 RPMs are built on Fedora Core 7 ppc64 (Sony PlayStation 3) To test, please either use the prebuilt binaries, rebuild the SRPM or build from source. If you encounter any issues, please drop us a line at ganglia-developers. There are only two changes since the last RC: - Added README for building Ganglia 3.0.x on Windows/Cygwin - Resolve gmetad issue on Max OSX (Mike Walker): http://www.mail-archive.com/ganglia- [EMAIL PROTECTED]/msg03014.html For the full log of changes since 3.0.4, please see the ChangeLog file in the tarball. P.S. This will be the last release of Ganglia 3.0.x -- the next major release will be 3.1.0 which will see some infrastructure overhaul and new exciting features -- stay tuned! Enjoy! Bernard -- --- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] No RRDs Created On MacOSX
Ok, After searching and testing different options, I am breaking down and asking for help :) Background: Running MacOSX 10.4.5, Ganglia 3.0.2 (verified gmond -v and gmetad -v), rrdtool version 1.2.12 I am just trying to test this on a local machine, but am running into problems. 1) Running gmond on test machine Tested with 'telnet localhost 8649' and the XML output looks good and I get values for the various METRIC 2) Running gmetad on test machine. After creating the rrds path (/ var/lib/ganglia/rrds) and changing ownership of rrds folder to nobody, I get no errors on launching gmetad. Tested with 'telnet localhost 8651' (or port 8652) I get the the XML output, but No Grid Data between GRID/GRID Also, NO RRDS files are created. Of course if I edit the config.php and get the web interface running, I am getting no data or plots (obviously) However, if I run gstat -a I do get the data I would expect. But when I run anything with gmetric I get nothing (no errors no output). Of course I might be doing gmetric wrong, so here is what I tried. 'gmetric -n mem_free -v mem_free -t uint32' I am at a loss of how to continue to debug this problem. Running interactively (both gmond and gmetad) are not displaying anything that jumps out at me. Any ideas on where to look or how to debug? Thanks, Mike
[Ganglia-general] host characteristics, hyperthreading and load sampling
Hi All, I'm new to the ganglia/gexec community and am interested in a few basics to start: I have set up a 16-node 2-CPU cluster for ganglia/gexec testing, running SuSE 9.1 w/ the 2.6.4-52-bigsmp kernel. So far all seems to be running fine and I get the expected results. First, is there a way that one can characterize the hosts so that gexec/gmond see them as multiple systems? In other words, when I try to submit gexec -n 17 hostname I get Not enough hosts available, although there are 32 CPUs available. My applications require fairly loaded (in the memory sense) servers, so I tend to use each CPU as a separate system. Also, as the CPUs in this cluster are hyperthreaded, the hosts are reported as 4-CPU machines... Second, what is the mechanism that gmond uses to sense load on each system, without pawing through the source? I need to set up nearly instantaneous load reporting, a la vmstat, in order to properly assign jobs to candidate machines, without getting SGE-style host pileup effects ;-) As a test, I submitted 4 large jobs via gexec (as gexec -n 1 jobname) in not-so-rapid succession, and they ended up all on the the same host, so I'm assuming there is some lag in gmond reporting the least-loaded target host. Any ideas in improving this? All in all, this is a great project and I look forward to participating in the future. Regards, Mike
[Ganglia-general] Issues with 2.6.3 kernel
We've moved from 2.4.25 to 2.6.3 and the nodes of our cluster can no longer communicate. A node running 2.6.3 can get stats from a 2.4.25 node, but not the other way around. The 2.6.3 was configured using the 2.4.25 config file as a base, so all of the network settings are the same. There seems to be something funky going on with the multicast support in 2.6.3 and ganglia. Any thoughts/known workarounds? Thanks! -Mike
[Ganglia-general] newbie question
I am just starting to work with ganglia and have one question. I am working to setup a cluster where systems reside on two subnets. I changed the 'mcast_ttl' value from 1 to 16. However the cluster members on subnet A do not see the cluster members on subnet B. Is there something I am missing. The only other thing I can think of is knowing whether or not the routers propogate multicast traffic Thanks Mike O'Donnell
[Ganglia-general] ganglia newbie
Greetings, I have just installed and configured ganglia and I have one question. The documentation that I have found is a bit sparse so I am looking through the code to get some answers. I do have one specific question: I have several systems on the same subnet and I want to set up two different clusters. I modified gmond.conf and changed the 'name' value. Once cluster is named Cluster1, the other is Cluster2. However, all systems running gmond see the all systems in both clusters. What have I missed? Also, any pointers to good information sources outside the official site would be helpful if they exist Thanks Mike O'Donnell
[Ganglia-general] Re: slackware 8
Aaron Lott ([EMAIL PROTECTED]) said: When I try to telnet to localhost from the queen node, I get the xml specs, but the I always get -Connection is closed by foreign host. Is this correct? I have no idea what is wrong with my setup. --snip-- /CLUSTER /GANGLIA_XML Connection closed by foreign host. is perfectly correct. Mike
[Ganglia-general] Re: flaw in multicast setup code?
matt massie ([EMAIL PROTECTED]) said: have you tried to run # route add -host 239.2.11.71 dev eth0 before you start gmond? nope; haven't tried that til now.. I guess I should RTFM eh? see http://ganglia.sourceforge.net/docs/faq.html#AEN587 does this solve your problem? it definitely does... in future releases i will do the rnnetlink() magic necessary to make this automagic. having the magic in gmond would be awesome... thanks for all your great work! Mike
[Ganglia-general] Re: Collect ps info using ganglia?
This could be a very powerful feature. Although transmitting each node's process list could be a little heavy handed as many of the processes that run on a given node are just noise for a person that is monitoring the progress of a particular cluster-wide job... so maybe have the 10 processes with the highest usage? But this isn't _always_ going to yield the process one might be interested in.. All this said, I think that ganglia's strength is it's efficiency... if every node's process list gets mcast across the network in addition to the existing metric traffic; ganglia _might_ choke the network a bit more than some would like. Mike Asaph Zemach ([EMAIL PROTECTED]) said: How about extending ganglia to collect ps information? Suppose we add to the XML something like: !ELEMENT PROCESS EMPTY !ATTLIST PROCESS NAMECDATA #REQUIRED USERCDATA #REQUIRED PID CDATA #REQUIRED CPU CDATA #REQUIRED MEM CDATA #REQUIRED SZ CDATA #REQUIRED RSS CDATA #REQUIRED STATUS CDATA #REQUIRED . whatever else looks useful And the per-node output would look like: HOST NAME=compute-0-2 IP=10.255.255.252 REPORTED=1013270664 METRIC NAME=mem_free VAL=475380 TYPE=uint32 UNITS=KBs SOURCE=gmond/ [] METRIC NAME=os_release VAL=2.4.9-13smp TYPE=string UNITS= SOURCE=gmond/ PROCESS NAME=mozilla-bin USER=asaph PID=13845 CPU=12.4 MEM=22.3 SZ=62008 RSS=55352 STATUS=S [...] PROCESS NAME=/bin/csh USER=asaph PID=13840 CPU=0.0 MEM=0.3 SZ=3872 RSS=2259 STATUS=S /HOST We could then easily implement a cluster-wide ps utility. On the negative side, this style of implementation would tend to return stale information, you wouldn't want to broadcast this information more than once every few seconds, so anybody using the feature would always be seeing the state of the processes as they were a few seconds ago. On the plus side this gives us a bound on the bandwidth consumed by the cluster-wide ps function. We know that no matter how many people retrieve the cluster-wide ps information we will not consume more than N*process_list_size/sample_rate of bandwidth. Moreover, since applications running on clusters tend to be long lived perhaps using somewhat stale information is no big deal. Thoughts? Asaph On Tue, Apr 09, 2002 at 12:53:29PM -0700, matt massie wrote: asaph- this is a much better way of collecting the metrics on linux. i like that your method eliminates 3 threads and all the mutex locking. i'll try out the code and likely include it in the next release. -matt Today, Asaph Zemach wrote forth saying... Here iks a drop-in replacement to linux.c that does not use the extra threads and gets rid of the now-unneeded locking. It seems to work. I think it's a little cleaner and more maintainable (e.g. no forgotten locking) for the future. Decide if you want to keep it. Asaph -- #include time.h #include ganglia.h #include metric_typedefs.h /* #include set_metric_val.h */ #define OSNAME Linux #define OSNAME_LEN strlen(OSNAME) /* Never changes */ char proc_cpuinfo[BUFFSIZE]; char proc_sys_kernel_osrelease[BUFFSIZE]; typedef struct { int last_read; int thresh; char *name; char buffer[BUFFSIZE]; } timely_file; timely_file proc_stat= { 0, 15, /proc/stat }; timely_file proc_loadavg = { 0, 15, /proc/loadavg }; timely_file proc_meminfo = { 0, 30, /proc/meminfo }; char *update_file(timely_file *tf) { int now,rval; now = time(0); if(now - tf-last_read tf-thresh) { rval = slurpfile(tf-name, tf-buffer, BUFFSIZE); if(rval == SYNAPSE_FAILURE) { err_msg(update_file() got an error from slurpfile() reading %s, tf-name); } else tf-last_read = now; } return tf-buffer; } /* * This function is called only once by the gmond. Use to * initialize data structures, etc or just return SYNAPSE_SUCCESS; */ g_val_t metric_init(void) { g_val_t rval; rval.int32 = slurpfile(/proc/cpuinfo, proc_cpuinfo, BUFFSIZE); if ( rval.int32 == SYNAPSE_FAILURE ) { err_msg(metric_init() got an error from slurpfile() /proc/cpuinfo); return rval; } rval.int32 = slurpfile( /proc/sys/kernel/osrelease, proc_sys_kernel_osrelease, BUFFSIZE); if ( rval.int32 == SYNAPSE_FAILURE ) { err_msg(kernel_func() got an error from slurpfile()); return rval; } /* Get rid
[Ganglia-general] Re: gmond 2.2.2 seg faults.
no dice... the problem is extremely random.. and the elusive core file isn't helping. As I said in a previous post, it segfaults 50% of my attempts at starting gmond 2.2.2; and strace and gdb must be delaying the threads just enough to help them keep on keeping on. These systems are SMP P3 1Ghz, 2G ram... it's extremely doubtful system speed matters though Could it be a subtle library incompatibilty? like foo-1.1.2 works but foo-1.1.1 causes a malloc error? Mike Neil Spring ([EMAIL PROTECTED]) said: I'm still guessing, but perhaps cd /tmp ulimit -c `which gmond` --debug_level=1 -i eth0 gdb `which gmond` core to get to some directory to which the user 'nobody' can write; perhaps gmond is not able to dump a core file in ~root after having setuid'd to nobody. -neil On Sun, Apr 07, 2002 at 01:13:11PM -0400, Mike Snitzer wrote: I can't get gmond to drop a core file when it seg faults... I used ulimit to set core to unlimited: [EMAIL PROTECTED] ~]# ulimit -a core file size (blocks) unlimited data seg size (kbytes) unlimited file size (blocks) unlimited max locked memory (kbytes) unlimited max memory size (kbytes)unlimited open files 1024 pipe size (512 bytes) 8 stack size (kbytes) 8192 cpu time (seconds) unlimited max user processes 16383 virtual memory (kbytes) unlimited Any ideas? Mike Neil Spring ([EMAIL PROTECTED]) said: On Sat, Apr 06, 2002 at 05:40:59PM -0500, Mike Snitzer wrote: Any recommendations for accurately debugging gmond would be great; cause when running through strace and gdb I can't get it to segfault. you might have already tried this, but unlimit core (or ulimit -c for bash) `which gmond` --debug_level=1 -i eth0 gdb `which gmond` core or is gdb unable to sort out the threads? -neil ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] gmond 2.2.2 seg faults.
gmond segfaults 50% of the time at startup. The random nature of it suggests to me that their is a race condition when the gmond threads startup. When I tried to strace or run gmond through gdb the problem wasn't apparant.. which is what led me to believe it's a threading problem that strace or gdb masks. Any recommendations for accurately debugging gmond would be great; cause when running through strace and gdb I can't get it to segfault. FYI, I'm running gmond v2.2.2 on 48 nodes of those 16 of the nodes' gmond segfaulted at startup... Mike ps. here's an example: `which gmond` --debug_level=1 -i eth0 mcast_listen_thread() received metric data cpu_speed mcast_value() mcasting cpu_user value 2051 pre_process_node() remote_ip=192.168.0.28encoded 8 XDR bytespre_process_node() has saved the hostname pre_process_node() has set the timestamp pre_process_node() received a new node XDR data successfully sent set_metric_value() got metric key 11 set_metric_value() exec'd cpu_nice_func (11) Segmentation fault
[Ganglia-general] gmond default mcast interface?
All, While getting ganglia 2.2.1 going on a cluster I noticed gmond -h stated: -i, --mcast_if set the interface gmond is to multicast on default: first interface e.g. eth0 this however does not appear to be the case; as the multicast was going out eth1. So I was only seeing the master node in the php-rrd-client. As soon as I used: gmond -i eth0 all the nodes in the cluster were viewable through the php-rrd-client. I've yet to get around to hacking the gmond source; but figured I'd first mail the list to see if others have seen eth0 not being used as the default multicast interface. Thanks, Mike