[Ganglia-general] Nfs

2021-04-02 Thread mike abcdefg
Sorry if this is a dead horse,
But how can I get stats on nfs mounts?
I tried to edit the local mounts line to include nfs but I still don't see
it..
I'm running RHEL and ganglia 3.7.1
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Python modules, NVIDIA, modpython.conf

2014-07-08 Thread Mike Johnson
Hi

I'm having some issues in configuring the python modules for Ganglia on an
Ubuntu 14.04 box.  It has the standard install of gmond (3.5.0) from
packages as well as the additional modules and python modules from packages.

Problem is, the additional modules work but it doesn't look like the python
modules work.  These would be the ones downloaded from
https://github.com/ganglia/gmond_python_modules/

I've included the following in gmond.conf and tried it with and without:

   module {
  name = python_module
  path = /usr/lib/ganglia/modpython.so
  params = /usr/lib/ganglia/python_modules/
   }

In particular, trying to include mod_python.conf anywhere (conf.d) causes
an error 'no such option 'param''.

I've amended paths where necessary to match the directory in which the
python module files are created: /usr/lib/ganglia/python_modules.

It works on a 12.04 box set up by my predecessor.  I'm wondering if there
was something that needed to be fixed that wasn't documented or whether
it's version-specific.

Any help anyone can provide would be much appreciated.  I've used Ganglia
several times in the past and I can't really imagine a cluster without it.

Cheers
Mike
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] cygwin 1.7 + ganglia 3.5

2013-09-24 Thread mike mike
Dear All

  I am trying to compile ganglia 3.5 on cygwin 1.7 and facing the same
issues as mentioned in the github

https://github.com/ganglia/monitor-core/issues/96

just wondering is there any solution for this?

thanks for any suggestion!!

Best Regard!
 Mike
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60133471iu=/4140/ostg.clktrk___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] compiling on cygwin 1.7

2013-09-18 Thread mike mike
Dear All

trying to compile ganglia 3.5 or 3.6 on Cygwin but got the following error.
I have seen previous thread and follow the procedure and no luck. thanks
for any suggestion!

cygwin 1.7.25

libexpat-devel  2.1.0-3
libexpat1   2.1.0-3
libapr1 1.4.8-1
libapr1-devel   1.4.8-1
gcc-core4.7.3-1
gcc-g++ 4.7.3-1
libgcc1 4.7.3-1
libpcre-devel   8.33-1
pkg-config  0.23b-10
python  2.7.3-1
sunrpc  4.0-3
libpcre-devel   8.33-1
libpcre18.33-1


have compile confuse-2.7 and install it
./configure --disable-nls
make
make install

and install try to compile ganglia
 ./configure --with-libconfuse=/usr/local --without-libpcre
--enable-static-build

then got the following error when compiling libmetric

make[4]: Entering directory `/home/mike/ganglia-3.6.0/libmetrics/cygwin'
/bin/sh ../libtool --tag=CC--mode=compile gcc -std=gnu99
-DHAVE_CONFIG_H -I. -I.. -I.. -I../../lib -I../../include -g -O2 -Wall
-MT metrics.lo -MD -MP -MF .deps/metrics.Tpo -c -o metrics.lo metrics.c
libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -I.. -I../../lib
-I../../include -g -O2 -Wall -MT metrics.lo -MD -MP -MF .deps/metrics.Tpo
-c metrics.c  -DDLL_EXPORT -DPIC -o .libs/metrics.o
In file included from /usr/include/cygwin/in.h:267:0,
 from /usr/include/netinet/in.h:14,
 from ../unpifi.h:22,
 from ../interface.h:10,
 from metrics.c:33:
/usr/include/cygwin/in6.h:75:8: error: redefinition of ‘struct in6_addr’
In file included from
/usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/mprapi.h:10:0,
 from
/usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/iprtrmib.h:9,
 from
/usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/iphlpapi.h:13,
 from metrics.c:18:
/usr/lib/gcc/i686-pc-cygwin/4.7.3/../../../../include/w32api/ras.h:19:16:
note: originally defined here
metrics.c: In function ‘proc_run_func’:
metrics.c:720:46: warning: variable ‘cProcesses’ set but not used
[-Wunused-but-set-variable]
Makefile:259: recipe for target `metrics.lo' failed
make[4]: *** [metrics.lo] Error 1
make[4]: Leaving directory `/home/mike/ganglia-3.6.0/libmetrics/cygwin'
Makefile:361: recipe for target `all-recursive' failed
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/home/mike/ganglia-3.6.0/libmetrics'
Makefile:247: recipe for target `all' failed
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/mike/ganglia-3.6.0/libmetrics'
Makefile:370: recipe for target `all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/mike/ganglia-3.6.0'
Makefile:287: recipe for target `all' failed
make: *** [all] Error 2


Many thanks for any suggestion!

Best Regard!
 Mike
--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] custom python module graphs update values only on gmond restart

2011-12-08 Thread Mike Broers
The root of the problem was in my python, once I moved the connection
creation statement into the metric handler method the graphs updated as
expected.

On Wed, Dec 7, 2011 at 11:32 AM, Mike Broers mbro...@gmail.com wrote:

 I created a python module to graph the results of a postgres query.  When
 I evoke the python program manually by calling python postgres.py I get the
 results I expect (they change).  When I put the module and pyconf into the
 ganglia folders and restart gmond I get a graph that stays constant until I
 restart gmond again.  Whenever I restart gmond, the values get updated and
 the graphs change, but then remain constant until I restart gmond again.

 Here are the .py and .pyconf files, I'm unclear if there is a conf or
 update that needs to take place elsewhere to get these new python module
 metrics to start collecting based on the interval, perhaps on the gmetad
 side? I have the collect_every = 10 so I would assume it knows to collect
 more than once..

 #
 #postgres.py
 #

 import psycopg2

 #set up postgres connection
 pgdsn= dbname=qa host=localhost user=postgres port=6543 password=
 db_conn = psycopg2.connect(pgdsn)


 def pg_active(name):
 pg_active_sql = select count(*)::integer as count from
 pg_stat_activity where current_query  'IDLE' and current_query 
 'IDLE in transaction'

 db_curs = db_conn.cursor()
 db_curs.execute(pg_active_sql)
 pg_active_sql_results = db_curs.fetchall()

 (,count_active) = pg_active_sql_results[0]
 pg_active_count= int(count_active) - 1
 return pg_active_count

 db_curs.close()
 db_conn.close()

 def metric_init(params):
 global descriptors

 d3 = {'name': 'Pypg_active_sessions',
 'call_back': pg_active,
 'time_max': 90,
 'value_type': 'uint',
 'units': 'Sessions',
 'slope': 'both',
 'format': '%u',
 'description': 'PG Active Sessions',
  'groups': 'Postgres'}

 descriptors = [d3]
 return descriptors

 def metric_cleanup():
 '''Clean up the metric module.'''
 pass

 #This code is for debugging and unit testing
 if __name__ == '__main__':
 metric_init({})
 for d in descriptors:
 v = d['call_back'](d['name'])
 print 'value for %s is %u' % (d['name'],  v)

 #
 #postgres.pyconf
 #
 modules {
module {
  name = postgres
  language = python
}
 }

 collection_group {
collect_every = 10
time_threshold = 50
metric {
  name = Pypg_active_sessions
  title = Postgres Active Sessions
  value_threshold = 1
}

 }


 Thanks for reviewing!
 Mike

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmetad polling another gmetad data source broken in 3.2.0?

2011-11-01 Thread Mike Ellis
Mark Wagner mwagner at intelius.com writes:

 
 This is the patch I ended up using:
 
 diff -urN ganglia-3.2.0.dist/gmetad/process_xml.c 
 ganglia-3.2.0/gmetad/process_xml.c
 --- ganglia-3.2.0.dist/gmetad/process_xml.c 2011-07-07 
 08:44:35.0 -0700
 +++ ganglia-3.2.0/gmetad/process_xml.c  2011-10-21 15:18:31.0 -0700
 @@ -1172,6 +1172,7 @@
 {
case GRID_TAG:
   rc = endElement_GRID(data, el);
 +rc = endElement_CLUSTER(data, el);
  break;
 
case CLUSTER_TAG:
 


Seems to be working quite nicely so far. (though much work remains on my side)
Thanks for the info  quick response.

Has this already been accepted as a patch upstream? Anything we can do to ensure
others don't run into this issue will surely be appreciated.

thanks again,

 -- MikeE


--
RSA#174; Conference 2012
Save $700 by Nov 18
Register now#33;
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia Installation Issues

2010-12-07 Thread Mike
Hi Antonio,

   I could finally see the Ganglia up running in the webinterface. I restarted 
everything and now its fine.
Thanks a lot for your help. 
Now I am looking for monitoring hadoop using Ganglia. I added the metrics 
properties to the hadoop-metrics properties. Is there something else I have to 
do to see the hadoop metrics in ganglia?

Thanks,
Mike

--- On Tue, 12/7/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com 
wrote:

From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com
Subject: Re: [Ganglia-general] Ganglia Installation Issues
To: Mike nano_kol...@yahoo.com
Cc: Ganglia ganglia-general@lists.sourceforge.net
Date: Tuesday, December 7, 2010, 11:37 AM



2010/12/6 Mike nano_kol...@yahoo.com


Yes, I have the web folder copied to /var/www/ganglia
Do we have to keep  in gmond.conf,
tcp_accept_channel {


   port = 8649
 }  

Because trying to start gmond with this included in the conf gave me an error 
Unable to create tcp_accept_channel. So I removed this from gmond.conf



Thanks,
Mike

--- On Mon, 12/6/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com 
wrote:



From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com


Subject: Re: [Ganglia-general] Ganglia Installation Issues
To: Mike nano_kol...@yahoo.com
Cc: Ganglia ganglia-general@lists.sourceforge.net


Date: Monday,
 December 6, 2010, 9:07 PM



2010/12/6 Mike nano_kol...@yahoo.com




Hi Antonio,

  Thanks much for your response. I now ran /usr/sbin/update-rc.d -f gmond 
defaults and
/usr/sbin/update-rc.d -f gmetad defaults, which initially gave me error: 
update-rc.d: /etc/init.d/gmond: file does not exist. The init script was in 
/etc/rc.d/init.d/. Then I copied it to /etc/init.d/gmond. Now the update-rc.d 
went fine .





I cannot view the web interface when I point to http://ip_address/ganglia/
and I get The server at ip_address is taking too long to respond.



Here are some relevant information:



A)  I start gmond by /usr/sbin/gmond when I do a telnet EC2Ip_address 8649,
     it gives
HOST NAME=EC2Ip_address IP=10.251.86.192 REPORTED=1291666741 TN=30 
TMAX=20 DMAX=0 LOCATION=EC2Ip_address GMOND_STARTED=129121





B)  /usr/sbin/gmond -d 10 gives me this: 

Got a heartbeat message 1291666768




    metric 'cpu_user' being collected now

    metric 'cpu_user' has value_threshold 1.00
    metric 'cpu_system' being collected now




    metric 'cpu_system' has value_threshold 1.00
    metric 'cpu_idle' being collected now




    metric 'cpu_idle' has value_threshold 5.00
    metric 'cpu_nice' being collected now




    metric 'cpu_nice' has value_threshold 1.00
    metric 'cpu_aidle' being collected now




    metric 'cpu_aidle' has value_threshold 5.00
    metric 'cpu_wio' being collected now




    metric 'cpu_wio' has value_threshold 1.00
    metric 'load_one' being collected now




    metric 'load_one' has value_threshold 1.00
    metric 'load_five' being collected now




    metric 'load_five' has value_threshold 1.00
    metric 'load_fifteen' being collected now




    metric 'load_fifteen' has value_threshold 1.00
    sent message 'heartbeat' of length 56 with 0 errors




Processing a metric value message from EC2_IP
Got a heartbeat message 1291667489
and goes on




C)  /usr/sbin/gmetad -d 10 gives me this


Going to run as user nobody
Sources are ...
Source:
 [MyCluster, step 15] has 1 sources
    10.251.86.192
xml listening on port 8651




interactive xml listening on port 8652
cleanup thread has been started
Data thread 1147169104 is monitoring [MyCluster] data source




    10.251.86.192
[MyCluster] is a 2.5 or later data stream
hash_create size = 1024




hash-size is 1031
hash_create size = 50
hash-size is 53




hash_create size = 50
hash-size is 53
[MyCluster] is a 2.5 or later data stream




[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream




[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream




[MyCluster] is a 2.5 or later data stream
 
 ...etc

D)   Here are the relevant parts in my /etc/ganglia/gmond.conf

cluster {




  name = MyCluster
  owner = myclusterowner


  latlong = unspecified
  url = unspecified
}




host {
  location = IP_of_EC2
}





udp_send_channel {
  mcast_join = IP_of_EC2



  port = 8666

  ttl = 1
}
udp_recv_channel {




    port = 8666
    family = inet4
}





And gmetad.conf has

data_source MyCluster ipaddress:8649

Any help on this would be highly appreciated!.

Thanks,
Mike






  
If I'm not wrong, this error is showed when gmond is already running. You have 
to keep this lines to get working the system. Try readd the lines, stopping 
gmond and restarting, it must work fine.



One question:  you can see the ganglia website in any case, isn't?

Regards,
Antonio

Re: [Ganglia-general] Ganglia Installation Issues

2010-12-06 Thread Mike
Hi Antonio,

  Thanks much for your response. I now ran /usr/sbin/update-rc.d -f gmond 
defaults and
/usr/sbin/update-rc.d -f gmetad defaults, which initially gave me error: 
update-rc.d: /etc/init.d/gmond: file does not exist. The init script was in 
/etc/rc.d/init.d/. Then I copied it to /etc/init.d/gmond. Now the update-rc.d 
went fine .

I cannot view the web interface when I point to http://ip_address/ganglia/
and I get The server at ip_address is taking too long to respond.

Here are some relevant information:

A)  I start gmond by /usr/sbin/gmond when I do a telnet EC2Ip_address 8649,
     it gives
HOST NAME=EC2Ip_address IP=10.251.86.192 REPORTED=1291666741 TN=30 
TMAX=20 DMAX=0 LOCATION=EC2Ip_address GMOND_STARTED=129121

B)  /usr/sbin/gmond -d 10 gives me this: 

Got a heartbeat message 1291666768

    metric 'cpu_user' being collected now
    metric 'cpu_user' has value_threshold 1.00
    metric 'cpu_system' being collected now
    metric 'cpu_system' has value_threshold 1.00
    metric 'cpu_idle' being collected now
    metric 'cpu_idle' has value_threshold 5.00
    metric 'cpu_nice' being collected now
    metric 'cpu_nice' has value_threshold 1.00
    metric 'cpu_aidle' being collected now
    metric 'cpu_aidle' has value_threshold 5.00
    metric 'cpu_wio' being collected now
    metric 'cpu_wio' has value_threshold 1.00
    metric 'load_one' being collected now
    metric 'load_one' has value_threshold 1.00
    metric 'load_five' being collected now
    metric 'load_five' has value_threshold 1.00
    metric 'load_fifteen' being collected now
    metric 'load_fifteen' has value_threshold 1.00
    sent message 'heartbeat' of length 56 with 0 errors
Processing a metric value message from EC2_IP
Got a heartbeat message 1291667489
and goes on

C)  /usr/sbin/gmetad -d 10 gives me this

Going to run as user nobody
Sources are ...
Source: [MyCluster, step 15] has 1 sources
    10.251.86.192
xml listening on port 8651
interactive xml listening on port 8652
cleanup thread has been started
Data thread 1147169104 is monitoring [MyCluster] data source
    10.251.86.192
[MyCluster] is a 2.5 or later data stream
hash_create size = 1024
hash-size is 1031
hash_create size = 50
hash-size is 53
hash_create size = 50
hash-size is 53
[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream
 
 ...etc

D)   Here are the relevant parts in my /etc/ganglia/gmond.conf

cluster {
  name = MyCluster
  owner = myclusterowner
  latlong = unspecified
  url = unspecified
}
host {
  location = IP_of_EC2
}

udp_send_channel {
  mcast_join = IP_of_EC2
  port = 8666
  ttl = 1
}
udp_recv_channel {
    port = 8666
    family = inet4
}

And gmetad.conf has

data_source MyCluster ipaddress:8649

Any help on this would be highly appreciated!.

Thanks,
Mike



--- On Sun, 12/5/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com 
wrote:

From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com
Subject: Re: [Ganglia-general] Ganglia Installation Issues
To: Mike nano_kol...@yahoo.com
Cc: Ganglia ganglia-general@lists.sourceforge.net
Date: Sunday, December 5, 2010, 9:57 AM


Hey, Mike,

2010/12/5 Mike nano_kol...@yahoo.com


Hi all,

   I am trying to get Ganglia run on Ubuntu instance. I built the version 3.1.7 
from source. I compiled the source and libs were installed in 
/etc/ganglia/lib64/ganglia/


I used the command: ./configure --prefix=/etc/ganglia --with-gmetad 
--sysconfdir=/etc/ganglia  make  make install , and everything went fine.
Ichanged the GMOND in gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and 
changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad



1. I have copied the gmond/gmond.init from the build directory to 
/etc/rc.d/init.d/gmond and when I start gmond using the command
/etc/rc.d/init.d/gmond start
I get the following error.
.: 9: Can't open /etc/rc.d/init.d/functions


Also I copied gmetad/gmetad.init to /etc/rc.d/init.d/gmetad and starting it 
also fails with the above error.
 What is expected for /etc/rc.d/init.d/functions ? by these scripts.

When I try something like gmond -d 1 to start the gmond in foreground it 
gives a message that :
    [PYTHON] Can't open the python module path 
/etc/ganglia/lib64/ganglia/python_modules. Module python_module failed to 
initialize.




In this case, you have to put gmond and gmetad in the startup applications. You 
can do this with:

$ sudo updatedb-rc.d -f gmond defaults
$ sudo updatedb-rc.d -f gmetad defaults



The other thing can be solved creating this directory or comment the line that 
searches it in /etc/ganglia/gmond.conf. I'm not sure if it's necessary but the 
owner of this folder is 'nobody' in my

Re: [Ganglia-general] Ganglia Installation Issues

2010-12-06 Thread Mike
Yes, I have the web folder copied to /var/www/ganglia
Do we have to keep  in gmond.conf,
tcp_accept_channel {
   port = 8649
 }  

Because trying to start gmond with this included in the conf gave me an error 
Unable to create tcp_accept_channel. So I removed this from gmond.conf

Thanks,
Mike

--- On Mon, 12/6/10, Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com 
wrote:

From: Antonio Óscar Balmaseda antonio.o.balmas...@gmail.com
Subject: Re: [Ganglia-general] Ganglia Installation Issues
To: Mike nano_kol...@yahoo.com
Cc: Ganglia ganglia-general@lists.sourceforge.net
Date: Monday, December 6, 2010, 9:07 PM



2010/12/6 Mike nano_kol...@yahoo.com


Hi Antonio,

  Thanks much for your response. I now ran /usr/sbin/update-rc.d -f gmond 
defaults and
/usr/sbin/update-rc.d -f gmetad defaults, which initially gave me error: 
update-rc.d: /etc/init.d/gmond: file does not exist. The init script was in 
/etc/rc.d/init.d/. Then I copied it to /etc/init.d/gmond. Now the update-rc.d 
went fine .



I cannot view the web interface when I point to http://ip_address/ganglia/
and I get The server at ip_address is taking too long to respond.

Here are some relevant information:



A)  I start gmond by /usr/sbin/gmond when I do a telnet EC2Ip_address 8649,
     it gives
HOST NAME=EC2Ip_address IP=10.251.86.192 REPORTED=1291666741 TN=30 
TMAX=20 DMAX=0 LOCATION=EC2Ip_address GMOND_STARTED=129121



B)  /usr/sbin/gmond -d 10 gives me this: 

Got a heartbeat message 1291666768


    metric 'cpu_user' being collected now

    metric 'cpu_user' has value_threshold 1.00
    metric 'cpu_system' being collected now


    metric 'cpu_system' has value_threshold 1.00
    metric 'cpu_idle' being collected now


    metric 'cpu_idle' has value_threshold 5.00
    metric 'cpu_nice' being collected now


    metric 'cpu_nice' has value_threshold 1.00
    metric 'cpu_aidle' being collected now


    metric 'cpu_aidle' has value_threshold 5.00
    metric 'cpu_wio' being collected now


    metric 'cpu_wio' has value_threshold 1.00
    metric 'load_one' being collected now


    metric 'load_one' has value_threshold 1.00
    metric 'load_five' being collected now


    metric 'load_five' has value_threshold 1.00
    metric 'load_fifteen' being collected now


    metric 'load_fifteen' has value_threshold 1.00
    sent message 'heartbeat' of length 56 with 0 errors


Processing a metric value message from EC2_IP
Got a heartbeat message 1291667489
and goes on


C)  /usr/sbin/gmetad -d 10 gives me this


Going to run as user nobody
Sources are ...
Source:
 [MyCluster, step 15] has 1 sources
    10.251.86.192
xml listening on port 8651


interactive xml listening on port 8652
cleanup thread has been started
Data thread 1147169104 is monitoring [MyCluster] data source


    10.251.86.192
[MyCluster] is a 2.5 or later data stream
hash_create size = 1024


hash-size is 1031
hash_create size = 50
hash-size is 53


hash_create size = 50
hash-size is 53
[MyCluster] is a 2.5 or later data stream


[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream


[MyCluster] is a 2.5 or later data stream
[MyCluster] is a 2.5 or later data stream


[MyCluster] is a 2.5 or later data stream
 
 ...etc

D)   Here are the relevant parts in my /etc/ganglia/gmond.conf

cluster {


  name = MyCluster
  owner = myclusterowner


  latlong = unspecified
  url = unspecified
}


host {
  location = IP_of_EC2
}



udp_send_channel {
  mcast_join = IP_of_EC2

  port = 8666

  ttl = 1
}
udp_recv_channel {


    port = 8666
    family = inet4
}



And gmetad.conf has

data_source MyCluster ipaddress:8649

Any help on this would be highly appreciated!.

Thanks,
Mike



How is it going?

That's weird. Did you copy the files ganglia-X.YY/web in /var/www? Because it 
seems that gmond  gmetad are working fine...

Antonio.




  --
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Ganglia Installation Issues

2010-12-04 Thread Mike
Hi all,

   I am trying to get Ganglia run on Ubuntu instance. I built the version 3.1.7 
from source. I compiled the source and libs were installed in 
/etc/ganglia/lib64/ganglia/
I used the command: ./configure --prefix=/etc/ganglia --with-gmetad 
--sysconfdir=/etc/ganglia  make  make install , and everything went fine.
Ichanged the GMOND in gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and 
changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad

1. I have copied the gmond/gmond.init from the build directory to 
/etc/rc.d/init.d/gmond and when I start gmond using the command
/etc/rc.d/init.d/gmond start
I get the following error.
.: 9: Can't open /etc/rc.d/init.d/functions
Also I copied gmetad/gmetad.init to /etc/rc.d/init.d/gmetad and starting it 
also fails with the above error. What is expected for 
/etc/rc.d/init.d/functions ? by these scripts.

When I try something like gmond -d 1 to start the gmond in foreground it 
gives a message that :
    [PYTHON] Can't open the python module path 
/etc/ganglia/lib64/ganglia/python_modules. Module python_module failed to 
initialize.

2. Also I am running an EC2 instance, so while making the changes in the conf 
files,
 In the gmetad.conf I made foll changes:
 a. data_source MyCluster 'internalIP of the instance' ( or  should I add 
external IP of the instance?)
  b. What should I set for User gmetad will setuid to (defaults to 
nobody). My rrd directory is at /var/lib/ganglia/rrds and is owned by 
root.So    
    should I set the user here as root?

 In the gmond.conf I have the foll:
    cluster {
  name = MyCluster
 owner = MyOwner
  latlong = unspecified
  url = unspecified}
 host {  location = unspecified}    (again what should go in here???) 
 udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1}
    udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71}
    tcp_accept_channel {
   port = 8649}

All other conf paremeters are unchanged.
Please help me with this.

Thanks,
Mike


  --
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Ganglia Installation issues!URGENT

2010-12-03 Thread Mike
Hi all,

   I am trying to get Ganglia run on Ubuntu instance. I  built the version 
3.1.7 
from source. I compiled the source and libs were  installed in 
/etc/ganglia/lib64/ganglia/ 

I used the command:  ./configure --prefix=/etc/ganglia --with-gmetad  
--sysconfdir=/etc/ganglia  make  make install , and  everything went fine.
Ichanged the GMOND in  gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and 
changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad

1.  I have copied the gmond/gmond.init from the build directory to  
/etc/rc.d/init.d/gmond and when I start gmond using the command
/etc/rc.d/init.d/gmond start
I get the following error.
.: 9: Can't open  /etc/rc.d/init.d/functions
Also I copied gmetad/gmetad.init to   /etc/rc.d/init.d/gmetad and starting it 
also fails with the above error.  What is expected for 
/etc/rc.d/init.d/functions ? by these scripts.

When I try something like gmond -d 1 to start the gmond in foreground it 
gives 
a message that :
 [PYTHON] Can't open the python module path  
/etc/ganglia/lib64/ganglia/python_modules. Module python_module failed  to 
initialize.

2. Also I am running an EC2 instance, so while making the changes in the conf 
files, 

 In the gmetad.conf I made foll changes:
 a. data_source MyCluster 'internalIP of the instance' ( or  should I add 
external IP of the instance?)
   b. What should I set for User gmetad will setuid to (defaults to  
nobody). My rrd directory is at /var/lib/ganglia/rrds and is owned by  
root.So 

should I set the user here as root?

 In the gmond.conf I have the foll:
cluster {
  name = MyCluster
  owner = MyOwner
  latlong = unspecified
  url = unspecified}
 host {  location = unspecified}(again what should go in here???)  
 udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1}
udp_recv_channel {
  mcast_join =  239.2.11.71
  port = 8649
  bind = 239.2.11.71}
tcp_accept_channel {
   port = 8649}

All other conf paremeters are unchanged.
Please help me with this. 

Thanks,
Mike


  --
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Ganglia Installation issues!URGENT!!

2010-12-03 Thread Mike
Hi all,

   I am trying to get Ganglia run on Ubuntu instance. I  built the version 
3.1.7 
from source. I compiled the source and libs were  installed in 
/etc/ganglia/lib64/ganglia/ 

I used the command:  ./configure --prefix=/etc/ganglia --with-gmetad  
--sysconfdir=/etc/ganglia  make  make install , and  everything went fine.
Ichanged the GMOND in  gmond/gmond.init to GMOND=/etc/ganglia/sbin/gmond, and 
changed GMETAD in gmetad/gmetad.init to GMETAD=/etc/ganglia/sbin/gmetad

1.  I have copied the gmond/gmond.init from the build directory to  
/etc/rc.d/init.d/gmond and when I start gmond using the command
/etc/rc.d/init.d/gmond start
I get the following error.
.: 9: Can't open  /etc/rc.d/init.d/functions
Also I copied gmetad/gmetad.init to   /etc/rc.d/init.d/gmetad and starting it 
also fails with the above error.  What is expected for 
/etc/rc.d/init.d/functions ? by these scripts.

When I try something like gmond -d 1 to start the gmond in foreground it 
gives 
a message that :
 [PYTHON] Can't open the python module path  
/etc/ganglia/lib64/ganglia/python_modules. Module python_module failed  to 
initialize.

2. Also I am running an EC2 instance, so while making the changes in the conf 
files, 

 In the gmetad.conf I made foll changes:
 a. data_source MyCluster 'internalIP of the instance' ( or  should I add 
external IP of the instance?)
   b. What should I set for User gmetad will setuid to (defaults to  
nobody). My rrd directory is at /var/lib/ganglia/rrds and is owned by  
root.So 

should I set the user here as root?

 In the gmond.conf I have the foll:
cluster {
  name = MyCluster
  owner = MyOwner
  latlong = unspecified
  url = unspecified}
 host {  location = unspecified}(again what should go in here???)  
 udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1}
udp_recv_channel {
  mcast_join =  239.2.11.71
  port = 8649
  bind = 239.2.11.71}
tcp_accept_channel {
   port = 8649}

All other conf paremeters are unchanged.
Please help me with this. 

Thanks,
Mike


  --
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] ganglia installation

2010-10-18 Thread Mike
Thanks much Bernard. I shall try from scratch installing the latest version and 
get back to you if I am stuck.





From: Bernard Li bern...@vanhpc.org
To: Mike nano_kol...@yahoo.com
Cc: Ganglia ganglia-general@lists.sourceforge.net
Sent: Thu, October 14, 2010 7:23:36 PM
Subject: Re: [Ganglia-general] ganglia installation

Hi Mike:

When responding, please make sure you reply-all so that replies are
sent back to the list.  This ensures that our discussions are archived
for future users encountering the same issue, thanks!

On Mon, Oct 11, 2010 at 2:53 PM, Mike nano_kol...@yahoo.com wrote:

 Thanks much for your response. I had tried installing the version 3.1 before
 but it didnt help.My instance is x86_64 and the ganglia README.txt says that
 Ganglia runs on Linux- i386, ia64, sparc, alpha, powerpc, m68k, mips,arm,
 hppa, s390t?  So I was skeptical about that. Sorry I am new to

Thanks for pointing that out.  I have recently fixed this in our
development branch (trunk) but forgot to backport it to our 3.0 and
3.1 trees.  I've just merged the changes to the 3.1 tree so the
documentation will be updated in the next 3.1 release:

https://sourceforge.net/apps/trac/ganglia/changeset/2349

 hadoop/ganglia and it would be really helpful if you can send me some
 tutorials where I can get the step by step process of installation.

Installing Ganglia under EC2 is no different from any other
environments, so any guide should suffice, for example this one:

http://www.jansipke.nl/installing-ganglia-on-centos

EC2-specific gotchas are mentioned here:

http://www.cultofgary.com/2008/10/16/ec2-and-ganglia/

I have added a link to this page in our Wiki page as well:
https://sourceforge.net/apps/trac/ganglia/wiki/ganglia_configuration

 Also my EC2 instance is not under any cluster name. So does it make sense to
 put some cluster name in gmetad.conf? Or the cluster name I put in here is
 independent of all those?

It's mostly for your reference.  What ends up being shown on the
frontend is actually Cluster from gmond.conf.

 Also I installed ganglia-web RPM as:
 wget -c
 http://downloads.sourceforge.net/ganglia/ganglia-web-3.1.0-1.el4.noarch.rpm
 rpm -ivh ganglia-gmetad-3.1.0-1.el4.i386.rpm
 ganglia-web-3.1.0-1.el4.noarch.rp

I can't remember if ganglia-web 3.1 is compatible with gmetad/gmond
2.5, but regardless, I think if you're just starting out, you should
use 3.1.7 across the board.  Also make sure that you have started
apache/httpd so that it will start serving pages of the frontend.

 How do I check the apache error_logs?

That is usually in /var/log/httpd.

Cheers,

Bernard



  --
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] ganglia installation

2010-10-10 Thread Mike
Hi all,

   I am trying to install ganglia in a single EC2 instance (Fedora x86_64  
GNU/Linux).I want to use ganglia to monitor hadoop performance. Hadoop  
installation is successful. I installed everything for ganglia(2.5.7) following 
the steps in the link in this same instance
http://wiki.appnexus.com/display/documentation/Monitoring+Instances+Using+Ganglia


but using x86_64 rpms instead of i386. I havent changed anything in gmetad.conf 
except for adding this: 

data_source unspecified localhost. 
I have the gmond.conf with all  the default values, I have modified nothing in 
it.
I also set hadoop-metrics.properties as explained in 
http://developer.yahoo.com/hadoop/tutorial/module7.html#ganglia with 
mapred.servers,dfs.servers,jvm.servers as localhost:8649
When I view the page http://hostnameofEC2instance/ganglia. It doesn't display 
anything, it waits for sometime and says page cannot be displayed as it takes 
long time to respond.
 I  am able to start gmond,gmetad etc(service gmetad start). and can do  telnet 
localhost 8649 to display the XML. Also when going on gmond  --debug=9/gmetad 
--debug=9 it goes on displaying some messages.

Can anyone help me with this,if I am going wrong somewhere?

Thanks,
Michael



  --
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Adding a custom view - how?

2008-08-20 Thread Mike Stalnaker
All;

I have a Ganglia server running 3.0.3.

In my gmetad.conf, I have data sources defined that look like:

data_source Development xx.yy.zz.aa:8650
data_source Production xx.yy.zz.bb:8650

This works great, but what I now need to do is provide some custom views for
my management, so that they can see the graphs only for hosts assigned to a
specific group ie:

Development [host1 host2 host3 host4] on one page, and
Development [host5 host6 host7 host8] on another, without disturbing the
existing pages. What¹s the best way to accomplish this?


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] any ideas as to where to start with this error

2008-02-19 Thread Mike Olson
I have web server and I am receiving this error

There was an error collecting ganglia data (127.0.0.1:8652): XML error:
SYSTEM or PUBLIC, the URI is missing at 1

What is the configuration file I need to change and what do I change.  I
checked in /var/www/ganglia/ganglia.php  but line 1 is only the top of the
file with no config options.

Thanks for your help
Mike
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Warning: fsockopen() [function.fsockopen]

2008-02-15 Thread Mike Olson
I am currently having a problem displaying my cluster information in a
web browser.  The odd thing is that when my friend pulls my data from
my netowork and displays it on his Apache server, I can see it in a
browser.  The message I receive when I try to display the info from my
Apache web server is below:

Warning: fsockopen() [function.fsockopen]: unable to connect to
127.0.0.1:8652 (Connection refused) in /var/www/ganglia/ganglia.php on
line 283

There was an error collecting ganglia data (127.0.0.1:8652): fsockopen
error: Connection refused

Any suggestions would be greatly appreciated.

Thanks,
Mike

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] additional info about fsock open error

2008-02-15 Thread Mike Olson
Just an FYI, I have the ports 8649 to 8652 forwarded on my router to my
Apache web server.  I have looked at the file on line 283 and I don't know
what part of that line is creating the error.  The line is below:

$fp = fsockopen( $ip, $port, $errno, $errstr, $timeout);

Thanks,
Mike
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] fsockopen problem

2008-02-15 Thread Mike Olson
Should I remove the webserver box as a data source from my gmond and gmetad
config files.  That box is also the ndb manager of my cluster.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] fsockopen problem

2008-02-15 Thread Mike Olson
I disabled forwarding of ports 8649-8652 on my router.  I checked the
gmond.conf and gmetad.conf files and I am not forwarding those ports to the
Apache server from those.  The Apache server is also the gmetad server.  I
tried browsing to it, but I still receive the same error message.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] fsockopen problem

2008-02-15 Thread Mike Olson
I tried to web to my external ip address, and my internal ip address of my
web server and I'm getting the same error message.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] fsockopen problem

2008-02-15 Thread Mike Olson
I've changed the line at 283 and now I get this error

There was an error collecting ganglia data (127.0.0.1:8652): XML error:
SYSTEM or PUBLIC, the URI is missing at 1

What next?
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] [Earthlink] Re: Ganglia 3.0.5 final RC

2007-09-18 Thread Mike Walker
Only speaking for what is happening on OSX.

The original issue (before the patches):
After reading all the data from the for(;;) loop, we would read a  
SYS_CALL buffer, determine that POLLUP was set and throw out the  
entire message when we set d-dead=1 and did a goto take_a_break;

Thus we where not getting any indication of an error, just gmetad  
would not work correctly on OSX.


With the RC release with the patch:
A) As we go into the if (struct_poll.revents  POLLIN) and do a  
SYS_CALL on 1023 bytes, we get back X bytes_read.
B) Then doing a 'if' on POLLHUP, we find that POLLHUP is set and  
would normally just do a 'break' which would take us out of the for 
(;;) loop and attempt to process the XML data.  However, with this  
patch we get XML parser errors, and thus throwing out the incomplete  
messages.   (Warning occurs when running in debug mode, and still the  
gmetad not working correctly on OSX.)

To test the theory:
However, IF we do another SYS_CALL for another 1023 bytes AFTER the  
check for POLLHUP (and before the 'break;')   there is an additional  
Y bytes read from the system socket buffer.  Thus, most of the time,  
we never receive the entire message before we hit the POLLHUP break,  
and thus loose the entire message.

I have only done code inspection of the OSX kernel (haven't compiled  
the kernel in debug), but it  Appears to set POLLHUP,  Not on the  
test when the application is done reading (as this 'if' statement  
represents) or a lost connection as suggested in the standard, but  
some other time way before we are done reading valid data off the  
socket buffer.

Thus, at this point, I would not even attempt to test for POLLHUP on  
OSX at this point.

Did that explain what we are seeing on OSX?

Mike


On Sep 18, 2007, at 7:47 AM, Brad Nicholes wrote:


 On 9/17/2007 at 9:23 PM, in message
 [EMAIL PROTECTED], Mike Walker
 [EMAIL PROTECTED] wrote:
 Bernard,
  No go.  This doesn't have the patch that I sent to work the OSX
 issues in gmetad.  It does have the suggestion by Brad,  of putting
 an if statement in the read loop to test for the POLLUP.  However,
 from the previous beta (3.0.5  on ~ Sept 10th) testing cycle and my
 email response back to the list after that beta, his suggestion
 doesn't work on OSX.

 The reason is that the KERNAL is done reading off the socket and sets
 the POLLUP flag BEFORE gmetad finishes reading the entire buffer.
 Thus, by breaking out of the read loop before the entire buffer is
 read, we get an incomplete message, and thus the messages are
 discarded by the XML parser.   The discarded messages  results in
 incorrect display in the ganglia PHP, by stating that machines are
 down, gaps in monitoring, etc.


I am sure that you are correct, so help me understand what is  
 going on here.  From what I could get from Google searches,  
 different platforms indicate an EOF in different ways.  Some set  
 just POLLIN and then indicate EOF by checking bytes_read == 0 after  
 a read().  In this case an revents of POLLHUP only indicates a  
 broken connection.  However other platforms send a POLLIN | POLLHUP  
 with the POLLHUP indicating the EOF.  In this way an extra read()  
 looking for byte_read==0 would be unnecessary.  A final read() can  
 be done and EOF determined all in the same operation.  In the  
 data_thread.c code as it was originally, a POLLIN with  
 bytes_read==0 would have functioned as expected.  But a POLLIN |  
 POLLHUP with bytes_read==anything would have resulted in aborting  
 the connection all together without processing any of the data that  
 had already be read.  By adding a check for POLLHUP within the  
 POLLIN handling, aborting the connection is avoided and the data is  
 processed normally.
Are you saying that even if POLLIN | POLLHUP is received and all  
 of the data is read from the socket, there is still more data on  
 the socket and a subsequent read must still be done until  
 bytes_read==0?  I guess the Curl guy just decided to treat POLLIN  
 == POLLHUP.  Does that seem safe for all platforms?  If my  
 assumptions are incorrect, which it looks like they are, then it  
 seems to me that going back to your original patch would be the  
 best solution.  Thoughts?

 Brad


 -- 
 ---
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2005.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-general mailing list
Ganglia

Re: [Ganglia-general] Ganglia 3.0.5 final RC

2007-09-17 Thread Mike Walker
Bernard,
No go.  This doesn't have the patch that I sent to work the OSX  
issues in gmetad.  It does have the suggestion by Brad,  of putting  
an if statement in the read loop to test for the POLLUP.  However,  
from the previous beta (3.0.5  on ~ Sept 10th) testing cycle and my  
email response back to the list after that beta, his suggestion  
doesn't work on OSX.

The reason is that the KERNAL is done reading off the socket and sets  
the POLLUP flag BEFORE gmetad finishes reading the entire buffer.   
Thus, by breaking out of the read loop before the entire buffer is  
read, we get an incomplete message, and thus the messages are  
discarded by the XML parser.   The discarded messages  results in  
incorrect display in the ganglia PHP, by stating that machines are  
down, gaps in monitoring, etc.

Sorry.  RC is a no go on OSX.

Mike

On Sep 17, 2007, at 2:55 PM, Bernard Li wrote:

 Dear all:

 This is absolutely the last RC for Ganglia 3.0.5 -- it has Brad
 Nicholes' fix for the Mac OSX issue so if folks who have access to Mac
 OSX (both x86 and ppc) please test this and report back
 success/failures, we can then make this the official release.

 As usual, the tarball and SRPM are available here:

 http://www.therealms.org/oss/ganglia/testing/

 Thanks for your attention.

 Cheers,

 Bernard

 On 9/7/07, Bernard Li [EMAIL PROTECTED] wrote:
 Dear all:

 The final release candidate for Ganglia 3.0.5 is now available:

 http://therealms.org/oss/ganglia/testing/

 i686 RPMs are built on Fedora Core 6 x86
 ppc64 RPMs are built on Fedora Core 7 ppc64 (Sony PlayStation 3)

 To test, please either use the prebuilt binaries, rebuild the SRPM or
 build from source.  If you encounter any issues, please drop us a  
 line
 at ganglia-developers.

 There are only two changes since the last RC:

 - Added README for building Ganglia 3.0.x on Windows/Cygwin
 - Resolve gmetad issue on Max OSX (Mike Walker):
 http://www.mail-archive.com/ganglia- 
 [EMAIL PROTECTED]/msg03014.html

 For the full log of changes since 3.0.4, please see the ChangeLog  
 file
 in the tarball.

 P.S. This will be the last release of Ganglia 3.0.x -- the next major
 release will be 3.1.0 which will see some infrastructure overhaul and
 new exciting features -- stay tuned!

 Enjoy!

 Bernard


 -- 
 ---
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2005.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] No RRDs Created On MacOSX

2006-02-19 Thread Mike Walker

Ok,
  After searching and testing different options, I am breaking down  
and asking for help :)


Background:  Running MacOSX 10.4.5, Ganglia 3.0.2 (verified gmond -v  
and gmetad -v), rrdtool version 1.2.12


I am just trying to test this on a local machine, but am running into  
problems.


1) Running gmond on test machine
  Tested with 'telnet localhost 8649'  and the XML output looks  
good and I get values for the various METRIC
2) Running gmetad on test machine.  After creating the rrds path (/ 
var/lib/ganglia/rrds) and changing ownership of rrds folder to  
nobody, I get no errors on launching gmetad.
	Tested with 'telnet localhost 8651'  (or port 8652) I get the the  
XML output, but No Grid Data between GRID/GRID

Also, NO RRDS files are created.

Of course if I edit the config.php and get the web interface running,  
I am getting no data or plots (obviously)


However, if I run gstat -a  I do get the data I would expect.  But  
when I run anything with gmetric I get nothing (no errors no  
output).  Of course I might be doing gmetric wrong, so here is  
what I tried.


'gmetric -n mem_free -v mem_free -t uint32'

I am at a loss of how to continue to debug this problem.  Running  
interactively (both gmond and gmetad) are not displaying anything  
that jumps out at me.


Any ideas on where to look or how to debug?

Thanks,
Mike



[Ganglia-general] host characteristics, hyperthreading and load sampling

2004-09-09 Thread Mike Gehl

Hi All,

I'm new to the ganglia/gexec community and am interested in a few 
basics to start:


I have set up a 16-node 2-CPU cluster for ganglia/gexec testing, 
running SuSE 9.1 w/ the 2.6.4-52-bigsmp kernel. So far all seems to be 
running fine and I get the expected results.


First, is there a way that one can characterize the hosts so that 
gexec/gmond see them as multiple systems? In other words, when I try to 
submit gexec -n 17 hostname I get Not enough hosts available, 
although there are 32 CPUs available. My applications require fairly 
loaded (in the memory sense) servers, so I tend to use each CPU as a 
separate system.


Also, as the CPUs in this cluster are hyperthreaded, the hosts are 
reported as 4-CPU machines...


Second, what is the mechanism that gmond uses to sense load on each 
system, without pawing through the source? I need to set up nearly 
instantaneous load reporting, a la vmstat, in order to properly assign 
jobs to candidate machines, without getting SGE-style host pileup 
effects ;-)


As a test, I submitted 4 large jobs via gexec (as gexec -n 1 jobname) 
in not-so-rapid succession, and they ended up all on the the same host, 
so I'm assuming there is some lag in gmond reporting the least-loaded 
target host. Any ideas in improving this?


All in all, this is a great project and I look forward to participating 
in the future.


Regards,

Mike




[Ganglia-general] Issues with 2.6.3 kernel

2004-03-03 Thread Mike Houston
We've moved from 2.4.25 to 2.6.3 and the nodes of our cluster can no 
longer communicate.  A node running 2.6.3 can get stats from a 2.4.25 
node, but not the other way around.  The 2.6.3 was configured using the 
2.4.25 config file as a base, so all of the network settings are the 
same.  There seems to be something funky going on with the multicast 
support in 2.6.3 and ganglia.


Any thoughts/known workarounds?

Thanks!
-Mike



[Ganglia-general] newbie question

2003-12-12 Thread mike . odonnell
I am just starting to work with ganglia and have one question.  I am working to 
setup a cluster where systems reside on two subnets.  I changed the 'mcast_ttl' 
value from 1 to 16.   However the cluster members on subnet A do not see the 
cluster members on subnet B.  Is there something I am missing.  The only other 
thing I can think of is knowing whether or not the routers propogate multicast 
traffic

Thanks


Mike O'Donnell







[Ganglia-general] ganglia newbie

2003-12-10 Thread mike . odonnell
Greetings,

I have just installed and configured ganglia and I have one question.  The 
documentation that I have found is a bit sparse so I am looking through the 
code to get some answers.  I do have one specific question:   I have several 
systems on the same subnet and I want to set up two different clusters.  I 
modified gmond.conf and changed the 'name' value.  Once cluster is named 
Cluster1, the other is Cluster2.  However, all systems running gmond see the 
all systems in both clusters.  What have I missed?

Also, any pointers to good information sources outside the official site would 
be helpful if they exist

Thanks

Mike O'Donnell






[Ganglia-general] Re: slackware 8

2002-07-05 Thread Mike Snitzer
Aaron Lott ([EMAIL PROTECTED]) said:

 
 When I try to telnet to localhost from the queen node, I get the xml
 specs, but the I always get -Connection is closed by foreign host. Is this
 correct? I have no idea what is wrong with my setup.

--snip--
/CLUSTER
/GANGLIA_XML
Connection closed by foreign host.

is perfectly correct.

Mike



[Ganglia-general] Re: flaw in multicast setup code?

2002-04-23 Thread Mike Snitzer
matt massie ([EMAIL PROTECTED]) said:

 have you tried to run
 # route add -host 239.2.11.71 dev eth0
 
 before you start gmond?

nope; haven't tried that til now.. I guess I should RTFM eh?

 
 see http://ganglia.sourceforge.net/docs/faq.html#AEN587
 
 does this solve your problem?

it definitely does... 

 
 in future releases i will do the rnnetlink() magic necessary to make this 
 automagic.

having the magic in gmond would be awesome... thanks for all your great
work!

Mike



[Ganglia-general] Re: Collect ps info using ganglia?

2002-04-09 Thread Mike Snitzer
This could be a very powerful feature.  Although transmitting each node's
process list could be a little heavy handed as many of the processes that
run on a given node are just noise for a person that is monitoring the
progress of a particular cluster-wide job... so maybe have the 10
processes with the highest usage? But this isn't _always_ going to yield
the process one might be interested in..

All this said, I think that ganglia's strength is it's efficiency... if
every node's process list gets mcast across the network in addition to
the existing metric traffic; ganglia _might_ choke the network a bit more
than some would like.

Mike

Asaph Zemach ([EMAIL PROTECTED]) said:

 How about extending ganglia to collect ps information?
 Suppose we add to the XML something like:
 
!ELEMENT PROCESS EMPTY
!ATTLIST PROCESS NAMECDATA #REQUIRED
  USERCDATA #REQUIRED
  PID CDATA #REQUIRED
  CPU CDATA #REQUIRED
  MEM CDATA #REQUIRED
  SZ  CDATA #REQUIRED
  RSS CDATA #REQUIRED
  STATUS  CDATA #REQUIRED
. whatever else looks useful
  
 
 And the per-node output would look like:
 
 
 HOST NAME=compute-0-2 IP=10.255.255.252 REPORTED=1013270664
 METRIC NAME=mem_free VAL=475380 TYPE=uint32 UNITS=KBs 
 SOURCE=gmond/
 
 []
 
 METRIC NAME=os_release VAL=2.4.9-13smp TYPE=string UNITS= 
 SOURCE=gmond/
 
 PROCESS NAME=mozilla-bin USER=asaph PID=13845 CPU=12.4 MEM=22.3
 SZ=62008 RSS=55352 STATUS=S 
 
 [...]
 
 PROCESS NAME=/bin/csh USER=asaph PID=13840 CPU=0.0 MEM=0.3
 SZ=3872 RSS=2259 STATUS=S 
 /HOST
 
 
 We could then easily implement a cluster-wide ps utility.
 
 On the negative side, this style of implementation would
 tend to return stale information, you wouldn't want to broadcast
 this information more than once every few seconds, so anybody
 using the feature would always be seeing the state of the processes
 as they were a few seconds ago.
 
 On the plus side this gives us a bound on the bandwidth consumed 
 by the cluster-wide ps function. We know that no matter how
 many people retrieve the cluster-wide ps information we will
 not consume more than N*process_list_size/sample_rate of bandwidth.
 
 Moreover, since applications running on clusters tend to be
 long lived perhaps using somewhat stale information is no
 big deal.
 
 Thoughts?
 
   Asaph
 
 
 
 On Tue, Apr 09, 2002 at 12:53:29PM -0700, matt massie wrote:
  asaph-
  
  this is a much better way of collecting the metrics on linux.  i like that 
  your method eliminates 3 threads and all the mutex locking.  i'll 
  try out the code and likely include it in the next release.
  
  -matt
  
  Today, Asaph Zemach wrote forth saying...
  
   Here iks a drop-in replacement to linux.c that does not
   use the extra threads and gets rid of the now-unneeded 
   locking. It seems to work. I think it's a little cleaner
   and more maintainable (e.g. no forgotten locking) for the future.
   
   Decide if you want to keep it.
   
 Asaph
   
   
   --
   #include time.h
   #include ganglia.h
   #include metric_typedefs.h
   
   /*
   #include set_metric_val.h
   */
   
   #define OSNAME Linux
   #define OSNAME_LEN strlen(OSNAME)
   
   /* Never changes */
   char proc_cpuinfo[BUFFSIZE];
   char proc_sys_kernel_osrelease[BUFFSIZE];
   
   typedef struct {
 int last_read;
 int thresh;
 char *name;
 char buffer[BUFFSIZE];
   } timely_file;
   
   timely_file proc_stat= { 0, 15, /proc/stat };
   timely_file proc_loadavg = { 0, 15, /proc/loadavg };
   timely_file proc_meminfo = { 0, 30, /proc/meminfo };
   
   char *update_file(timely_file *tf)
   {
 int now,rval;
 now = time(0);
 if(now - tf-last_read  tf-thresh) {
   rval = slurpfile(tf-name, tf-buffer, BUFFSIZE);
   if(rval == SYNAPSE_FAILURE) {
 err_msg(update_file() got an error from slurpfile() reading %s,
   tf-name);
   }
   else tf-last_read = now;
 }
 return tf-buffer;
   }
   
   
   
   
   
   
   /*
* This function is called only once by the gmond.  Use to 
* initialize data structures, etc or just return SYNAPSE_SUCCESS;
*/
   g_val_t
   metric_init(void)
   {
  g_val_t rval;
   
  rval.int32 = slurpfile(/proc/cpuinfo, proc_cpuinfo, BUFFSIZE);
  if ( rval.int32 == SYNAPSE_FAILURE )
 {
err_msg(metric_init() got an error from slurpfile() 
   /proc/cpuinfo);
return rval;
 }  
   
  rval.int32 = slurpfile( /proc/sys/kernel/osrelease, 
  proc_sys_kernel_osrelease, BUFFSIZE);
  if ( rval.int32 == SYNAPSE_FAILURE )
 {
err_msg(kernel_func() got an error from slurpfile());
return rval;
 }   
   
  /* Get rid

[Ganglia-general] Re: gmond 2.2.2 seg faults.

2002-04-07 Thread Mike Snitzer
no dice... the problem is extremely random.. and the elusive core file
isn't helping.  As I said in a previous post, it segfaults 50% of my
attempts at starting gmond 2.2.2; and strace and gdb must be delaying the
threads just enough to help them keep on keeping on. 

These systems are SMP P3 1Ghz, 2G ram... it's extremely doubtful system
speed matters though

Could it be a subtle library incompatibilty? like foo-1.1.2 works but
foo-1.1.1 causes a malloc error? 

Mike

Neil Spring ([EMAIL PROTECTED]) said:

 I'm still guessing, but perhaps
  cd /tmp
  ulimit -c
  `which gmond` --debug_level=1 -i eth0
  gdb `which gmond` core
 
 to get to some directory to which the user 'nobody' can
 write; perhaps gmond is not able to dump a core file in
 ~root after having setuid'd to nobody. 
 
 -neil
 
 On Sun, Apr 07, 2002 at 01:13:11PM -0400, Mike Snitzer wrote:
  I can't get gmond to drop a core file when it seg faults... I used ulimit
  to set core to unlimited:
  
  [EMAIL PROTECTED] ~]# ulimit -a
  core file size (blocks) unlimited
  data seg size (kbytes)  unlimited
  file size (blocks)  unlimited
  max locked memory (kbytes)  unlimited
  max memory size (kbytes)unlimited
  open files  1024
  pipe size (512 bytes)   8
  stack size (kbytes) 8192
  cpu time (seconds)  unlimited
  max user processes  16383
  virtual memory (kbytes) unlimited
  
  Any ideas?
  
  Mike
  
  Neil Spring ([EMAIL PROTECTED]) said:
  
   On Sat, Apr 06, 2002 at 05:40:59PM -0500, Mike Snitzer wrote:
Any recommendations for accurately debugging gmond would be great; cause
when running through strace and gdb I can't get it to segfault.
   
   you might have already tried this, but
   
   unlimit core (or ulimit -c for bash)
   `which gmond` --debug_level=1 -i eth0
   gdb `which gmond` core
   
   or is gdb unable to sort out the threads?
   
   -neil
   
  
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general



[Ganglia-general] gmond 2.2.2 seg faults.

2002-04-06 Thread Mike Snitzer
gmond segfaults 50% of the time at startup.  The random nature of it
suggests to me that their is a race condition when the gmond threads
startup.  When I tried to strace or run gmond through gdb the problem
wasn't apparant.. which is what led me to believe it's a threading problem
that strace or gdb masks.

Any recommendations for accurately debugging gmond would be great; cause
when running through strace and gdb I can't get it to segfault.

FYI, I'm running gmond v2.2.2 on 48 nodes of those 16 of the nodes' gmond
segfaulted at startup... 

Mike

ps.
here's an example:
`which gmond` --debug_level=1 -i eth0

mcast_listen_thread() received metric data cpu_speed
mcast_value() mcasting cpu_user value
2051 pre_process_node() remote_ip=192.168.0.28encoded 8 XDR
bytespre_process_node() has saved the hostname
pre_process_node() has set the timestamp
pre_process_node() received a new node


XDR data successfully sent
set_metric_value() got metric key 11
set_metric_value() exec'd cpu_nice_func (11)
Segmentation fault




[Ganglia-general] gmond default mcast interface?

2002-03-27 Thread Mike Snitzer
All,

While getting ganglia 2.2.1 going on a cluster I noticed gmond -h stated:

 -i, --mcast_if
   set the interface gmond is to multicast on
   default: first interface e.g. eth0

this however does not appear to be the case; as the multicast was going
out eth1.  So I was only seeing the master node in the php-rrd-client.

As soon as I used: gmond -i eth0  all the nodes in the cluster were
viewable through the php-rrd-client.

I've yet to get around to hacking the gmond source; but figured
I'd first mail the list to see if others have seen eth0 not being used as
the default multicast interface.

Thanks,
Mike