Thanks for all the replies. I've been silently taking down notes. Feel
free to keep the ideas coming.
-Matt
On Tue, Dec 13, 2011 at 6:43 AM, Chris Burroughs
chris.burrou...@gmail.comwrote:
On 12/09/2011 07:51 PM, Matt Massie wrote:
What are the things you would be most interested
We're in the process of pulling together a team to write an O'Reilly eBook
on Ganglia.
Here's a rough idea of some of the topics we could cover
- Ganglia's components and overall architecture
- Typical deployment configurations including simple steps for verifying
an installation (e.g.
For people who may not know me, I'm the person who wrote ganglia in 1999,
open-sourced it the following year and then worked to create a strong,
sustainable community around the software. Take a quick peak at the ganglia
web site http://ganglia.info/ and you see just how pervasively ganglia is
We've worked together remotely for many years, the opportunity has
finally come for us to meet in person!
I invite you to attend the inaugural in-person meeting of the Ganglia
Development Team, scheduled for Thursday-Friday Feb 28-29 2008. We
don't have a finalized agenda yet, but our primary
one possible thing to check is ntp. we have seen problems where ntp will
shift the time into the past after gmond is started. gmond's metric
scheduler will then wait that extra time before collecting and sending
data. this is often a problem when you run gmond inside a virtual machine.
good
i posted your slide on the ganglia web site (http://ganglia.info)...
hope that's ok... happy to hear that ganglia is so useful for you and
your team.
flickr rocks
-matt
On Fri, 2007-04-27 at 06:00 -0700, john allspaw wrote:
Slides are here:
as far as the plans to revive the gmetric repository...
i just went to hack up a fix to the bot mess and found that someone else
has already been working on the problem too... since the form now has a
verification field.
whoever is working the gmetric repository please feel free to use the
it really is in your deployment...
(02) Have the design issues been resolved in the meantime ?
yes. however, gmetad still uses threads and it a bit of a beast but
that's ok. it's not meant to be light-weight or installed on every
machine (just the machine serving web pages).
--
matt massie
phone
! is there a way to automate this process so that we can
make it part of the build process? i'd love for us to be able to easily
build a setup.exe for each ganglia release. i created the first
setup.exe for 3.0.0 using some gui tool i found on tucows just to make
the point...
--
matt massie
phone
/lists/listinfo/ganglia-general
--
matt massie
phone: 415.692.0828 x2843
fax: 415.278.0441
http://archrock.com/
On Fri, 2006-12-29 at 11:30 -0800, Ben Hartshorne wrote:
On Thu, Dec 28, 2006 at 02:40:52PM -0800, Peter Mui wrote:
Hi All (at ganglia-general):
I've been talking to Matt Massie about re-doing the Ganglia website
at http://ganglia.info/
We're open to any or all ideas at this point
___ Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general
--
matt massie
phone: 415.692.0828 x2843
fax: 415.278.0441
http://archrock.com/
x2843 or visit http://www.archrock.com/.
now that Primer Pack is out the door, i should have time soon to help
with our next ganglia release. hope you all have a great weekend!
--
matt massie
phone: 415.692.0828 x2843
fax: 415.278.0441
http://archrock.com/
* google queries
http://www.google.com
toney samuel wrote:
Hi i have installed as per the instructions on this link
http://www.ibm.com/collaboration/wiki/display/WikiPtype/ ganglia
http://www.ibm.com/collaboration/wiki/display/WikiPtype/ganglia
I am able to get the ganglia page and also the status of my node. I am
not getting
and
issue a warning?
4. What's the purpose of this: apr_gethostname( myname,
APRMAXHOSTLEN+1, global_context);
5. I'm sure there's a whole lot more I haven't tripped over yet...
Thanks,
Alex
matt massie wrote:
it is possible and not too painful.
path #1: use XDR directly in your
hello denis-
the part_max_used metric is derived by walking through each
partition and finding the one that is most full and reporting that.
so it's really that maximum percent used on one specific
partition. the part_max_used metric is really used as a warning
that a partition is
steve-
Ganglia is an open-source project that grew out of the University
of California, Berkeley Millennium Project which was initially
funded in large part by the National Partnership for Advanced
Computational Infrastructure (NPACI) and National Science
Foundation RI Award EIA-9802069.
thanks stefan for letting us know it wasn't working. it's working now.
-matt
On Dec 1, 2005, at 12:25 AM, [EMAIL PROTECTED] wrote:
view/save results in an error:
br /
bWarning/b: mysql_connect(): Unknown MySQL Server Host
'mysql' (0) in b/home/groups/g/ga/ganglia/htdocs/gmetric/
only hosts specified in a upd_send_channel need to have a
udp_recv_channel and tcp_accept_channel defined. on host 192.168.1.254
you would need to have
udp_recv_channel {
port = 8649
}
tcp_accept_channel {
port = 8649
}
or something like that. all the hosts sending data don't need any
that is a common gotcha.
gmetad is designed to only collect data from the first host in a
data_source that it connects with. the idea is that all hosts listed as
a part of a data_source have redundant data. if one host fails, gmetad
can get the data from the next host in the data_source.
here is an idea you might try.
all the rrd code is in ./gmetad/rrd_helpers.c
the function for creating rrds is RRD_create(). you can alter the
format of the round-robin archives there without breaking compatibility
(in upcoming version of gmetad will allow you to specify the archives in
the
. :)
-matt
Matt Massie wrote:
here is an idea you might try.
all the rrd code is in ./gmetad/rrd_helpers.c
the function for creating rrds is RRD_create(). you can alter the
format of the round-robin archives there without breaking compatibility
(in upcoming version of gmetad will allow you
Dan Moniz wrote:
Hi all,
I've been testing Ganglia for a while on a cluster of approximately 280
hosts. Approximately 260 of these are of one class -- data hosts -- and
another 18-20 are of another class -- compute hosts.
I had numerous issues getting Ganglia to work reliably while it was
i think i know what the bug is. i can give you a fix to try and see if
it works for you but it will require modifying the source. this fix
will be part of the next release 3.0.1.
the problem is in process_collection_group() function near line 1600 of
./gmond/gmond.c. when there is a
please submit this problem to the bugzilla database at
http://bugzilla.ganglia.info/ and we'll try to address it in the next
release if possible.
this has been an issue in the past with people and it's something we
should address (at least via some configuration option).
if you want an
can you grep for the ip address of the interface specified in the
missing nodes configuration? i suspect that what might be happening
is that the address on the interface you have specified might not be
resolvable so you will not get the node246 string but something else or
only an ip
Vinay bansal wrote:
New to Ganglia. I am trying to understand the Ganglia metrics output. e.g.
...METRIC NAME=cpu_nice VAL=0.0 TYPE=float UNITS=% TN=176336
TMAX=90 DMAX=0 SLOPE=both SOURCE=gmond/
Not clear what does the following means
a) TN
b) TMAX
c) DMAX
Tn = time now
(which is the
reports on 3.0.0 have been mostly positive but there is one annoyance
that seems to be effecting many users so i thought i'd report a simple
workaround to prevent others from getting frustrated by the same problem.
if you find that gmond truncates the XML output on your
tcp_accept_channel,
please add your request to http://bugzilla.ganglia.info/ so that it can
be cataloged. good idea.
-matt
Kevin A. Burton wrote:
We're using gmetric prefixes to group common metrics together in the UI.
For example
mysql_*
jdk5_*
iostat_*
I assume there's no way to group these together in the
you need to compile with gcc. make sure CC=gcc before running
./configure.
good luck
-matt
Steve Jones wrote:
uanme -a: IRIX64 hostname 6.5 07121149 IP35
file: ganglia-monitor-core-2.5.7.tar.gz
i unpacked and invoked the configure script. when running make, i receive
the following error
a new bugzilla site has been created to make submitting bug reports and
patches to the ganglia project much easier.
http://bugzilla.ganglia.info/
this site was created in anticipation of the upcoming 2.6.0 beta. we
hope this service will allow us to organize and move more
yemi-
you shouldn't have any problem running in safe mode.. except that you
will need to explicitly state the path to the rrdtool binary in the
safe configuration. otherwise, php will not allow it to be run. (i
can't remember exactly how that is done but i've seen it bounced around
the list).
robert-
an easy workaround would be to rebuild new RPMs on your RH7.3 boxes. it
should be as simple as...
% rpmbuild -ta ganglia-monitor-core-version.tar.gz
or it might be
% rpm -ta ganglia-monitor-core-version.tar.gz
on the old systems (i think rpmbuild is relatively new).
or you can just
dr. matroid (cool name)-
the fix would be to simply tell the web frontend where to find your
round-robin databases (since they are in a non-standard place).
all configuration options live in the ./config.php file. open it.. and
edit the line
$rrds = $gmetad_root/rrds;
to read
$rrds =
mike-
i think you are right. the router configuration is the place to look.
setting the ttl in the header of a multicast packet works more like a
suggestion for the router. if the router it configured to forward
mulitcast traffic, it will decrement the ttl and pass on the message
otherwise it
originally the ganglia lists were open to make it easier for people to
post their questions and get answers. unfortunately, the lists have been
getting hit with increasing amounts of SPAM (one spam message is too
much). to solve the SPAM problem, i've changed all the mailing lists to
be closed.
phil-
this is a common question so i've added it to the ganglia troubleshooting
FAQ. please visit
http://ganglia.sourceforge.net/docs/#troubleshooting_(faq).
i hope it will help you solve your problem. please let us know.
--
-matt
Today, Phil Forrest wrote forth saying...
From: Phil
steve-
the single biggest problem with scaling gmetad is disk i/o problems. what
type of filesystem are you writing the gmetad RRDs to? most people have
had very good luck using a Ram-based filesystem and then periodically
syncing the data to disk.
for example in linux,
% mount -t tmpfs
jonathan-
i'm just guessing here..
when you say you've removed metrics from ganglia.. you are saying that you
modified ./gmond/metric.h and ./gmond/key_metrics.h and removed some
metrics.. right? it's important that the key_metrics.h and metric.h
headers match.
i would run an md5sum on the
steve-
steve is right on track with the suggestions. it appears that you are
running two different versions of ganglia 2.x at the same time. a quick
way to get the version of gmond is to run...
% /usr/sbin/gmond --version
(or wherever you installed gmond on each machine). so running that
prashant-
there are two way to go here.
1.) just make your webserver gmond mute by adding the line
mute on
to you /etc/gmond.conf file. when the gmond is mute is will stop talking
on the multicast channel and will not appear in the list of hosts.
2.) don't run gmond locally but rather plug
jonathon-
thanks for letting us know about the deadend link. if you want to see
demos visit the ganglia web site (http://ganglia.sf.net/) and look at the
demos section. there are five demo sites right now.
-matt
Today, Jonathan Pauli wrote forth saying...
From: Jonathan Pauli [EMAIL
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
russell-
another trick for debugging the problem is to examine the core dump. when
you run...
% ulimit -a
core file size(blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
jason-
i know very little about network hardware. you might want to take a look
at this link...
http://ganglia.sourceforge.net/ganglia_docs/notes.html#NOTES-CISCO
i know some cisco users have found it helpful. let me know if it is not
helpful
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
ganglia gets it's host names from gethostbyaddr(). when gmond gets a
message from a remote gmond it reads the message header to get the IP
address of the sender. it then plugs that ip into gethostbyaddr.
gethostbyaddr behavior is determined by
john-
can you give us more details? what operating system are you compiling
ganglia on? what version of gcc are you using? are you using gcc/gmake?
what ./configure flags are you passing if any?
-matt
Today, John M Hicks wrote forth saying...
From: John M Hicks [EMAIL PROTECTED]
To:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
marcia-
i'm sorry.. i don't know why the mtu functions were missing in the latest
release. the functions have been checked into our CVS tree and will be a
part of all future releases. i've attached the latest, greatest irix.c
source for you...
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
david-
How is it possible to start gmond or gmetad on boot. I have tried using
a startup script on boot to run gmond and it always errors out. I use
full paths to the executable and the configuration file, but is will not
run automatically. If I
you can change where the ./configure script looks for include files and
libraries (if they are in non-standard locations). this is a general
autconf trick which might be helpful compiling other packages too.
% CFLAGS=-I/home2/globus/XXX/include \
CPPFLAGS=-I/home2/globus/XXX/include \
Today, Santanu Das wrote forth saying...
Hi all,
We have a 20 single CPU cluster up and running. Now we need to upgrade
its memory but before upgrading some tests are required in order to get
the desired result on a single node. So if I upgrade the memory of one
particular node, does
helge-
there is a simple answer to your problem. you need to compile the
monitor core with the (--enable-gexec) option so that gmond announces the
host is available for gexec to run on.
%./configure --enable-gexec
if you have a host that you DON'T to announce as a gexec host .. add the
prashant-
so when a node in the cluster dies the cluster size changes but the dead
node is not reported?
this is a new problem that i haven't heard of before. did gmond get
restarted after the node failed? ganglia knows the a node dies when it
stops getting heartbeats from a machine that it
kent-
i second steve's comments.. if you take a look at this slide from a talk
that i gave you'll see what's going on with multicast...
http://ganglia.sourceforge.net/talks/lug_lbl_talk/html/slide_6.html
.. using separate multicast channels will allow for hardward filtering at
the datalink
yasuhito-
this is a known bug and we are presently testing a solution which will be
incorporated into the next release.
--
matt
Tomorrow, Yasuhito Takamiya wrote forth saying...
Hi,
We are facing problems in running gmetad, we got following errors
in /var/log/messages in every 1.5
As part of a technical paper we are writing on ganglia we need to quantify
measurements and experiences people have found with ganglia. Another who
provides these details will receive clear kudos in the Acknowledgments
section of the paper. Because of the time pressures involved in this
i hope this well help to track down the problems people are having getting
ganglia up and running.
there is documentation for ganglia at
http://ganglia.sourceforge.net/ganglia_docs/
but i know that it is lacking and will be updated in time.
installation (step-by-step)
1. install gmond on all
lester-
sorry that i didn't get back to you sooner. i would recommend that you
use the linux.c in CVS as a test. we found that some of the variables in
the network stat functions where not being initialized. 2.5.2 will not
have that problem (of course). let me know if that fixes your
andy-
there is an interface option with gmetric to make sure it multicasts on
the correct interface.
# gmetric -i eth2 ...
# gmetric --mcast_if eth2 ...
that will ensure that gmetric communicates with the gmond's listening on
crazy interfaces. let me know if that solves the problem.
good
phil-
gmond isn't built to have multiple clusters in the xml output but gmetad
is specifically designed for that.
just put your smaller clusters on unique multicast channels (e.g. compute
cluster on 239.2.11.70 and tape robots on 239.2.11.71 etc etc)
then just use gmetad to pull each of the
jfreidin-
you can use gmetric to add monitoring for any extra metrics you want...
http://ganglia.sourceforge.net/ganglia_docs/usage.html#GMETRIC-USAGE
--
matt
Today, [EMAIL PROTECTED] wrote forth saying...
Why doesn't Ganglia monitor things like disk errors? Why weren't more node
health
Today, Steven Wagner wrote forth saying...
Looks like Matt replied to me off-list in the hope he could avoid making me
look like an idiot. And here I thought he was going to be a few days
behind in his e-mail... Little does he know that I actually *am* an idiot.
for being an idiot .. you
thanks for the bug report. the development team will take a look at the
changes to the 2.5.30 kernel and /proc/stat and be sure to update our
code.
--
matt
Sep 24, Benoit des Ligneris wrote forth saying...
Hello,
We tried to use ganglia on diskless nodes with system in a ramdisk. We
thanks, lester, for the information. i've added it to the
documentation...
http://ganglia.sourceforge.net/ganglia_docs/notes.html#NOTES-CISCO
thanks
--
matt
Oct 16, Lester Vecsey wrote forth saying...
Perhaps information regarding gmond on networks set up through cisco
catalyst switches
Oct 17, Michael William Knop wrote forth saying...
A few questions about Mr. Wagner's explanation of gmond's metrics:
cpu_user
cpu_system
cpu_nice
cpu_idle
cpu_wio
cpu_aidle
Concerning the Percentage of CPU cycles spent... metrics; is that since
boottime or since the last
experiencing...
.. i'm getting ready to go camping until tomorrow morning.. so if you
don't hear back from me it because i'm looking for orion in the sky and
drinking a cold one.
bottoms up
-matt
Today, Steven A. DuChene wrote forth saying...
On Fri, Oct 11, 2002 at 12:30:21AM -0700, matt massie
brian-
any program running.. requires CPU cycles. running gmond on a node takes
much less CPU than running top... for example (from a node on a 100 node
cluster)
# top -b | egrep -w 'gmond|top'
27988 massie14 0 1084 1080 760 R 5.6 0.0 0:00 top
3034 nobody 9 0 1684
leif-
i've been wanting to have a way to implement an active alerting mechanism
for a while. the development team would love some help if you're willing
to donate a little time.
i have an idea for a quick and smart hack (i think). gmetad is already
doing the hardest part of this work.
here's
doug-
just a little history of the location tag. the force behind the
web-frontend is Federico Sacerdoti (a great developer from the Grids and
Clusters group at the San Diego Supercomputer Center).
the NPACI Rocks cluster installation (http://rocks.npaci.edu/)
automagically names each cluster
ben-
here is how to troubleshoot your setup.
1. make sure that gmond is running on your web server
% ps -ef | grep gmond
nobody 26677 1 0 Sep19 ?00:00:00 /usr/sbin/gmond
nobody 26678 26677 0 Sep19 ?00:00:00 /usr/sbin/gmond
nobody 26679 26678 0 Sep19 ?
didier-
i need to add a section in the documentation talking about this since it
seems to be a common question.
when you use...
mcast_if eth1
.. in /etc/gmond.conf that tells gmond to send it data out the eth1
network interface but that doesn't necessarily mean that the source
address of the
balaji-
i'm not sure if this question was answered. right now, there is no way to
remove a metric from the list with restarting gmond. the assumption is
that once you create a metric you will periodically send updated values
which overwrite the old value.
ganglia 2.5.0 is being released
karl-
i'm glad that you sent the email but you did scare the bajeebahs out of
me! :) i was just working out some last minute details with the planet
lab group for the 2.5.0 release when i saw your message.
don't be so hard on yourself. better safe than sorry.
-matt
Today, Karl Kopper wrote
mark-
i've seen this behavior on the machine running the ganglia demo page but
it's just a p2 with 128 mbs of memory (soon to be upgraded).
i'm rewriting gmetad in C right now and will be incorporating it into the
monitoring-core distribution soon. the biggest bottleneck right now with
i'm sorry but i don't have access to any SuSE boxes to work this problem
out on.
# rpm -qa openssl
openssl-0.9.6b-8
# rpm -ql openssl
/lib/libcrypto.so.0.9.6b
/lib/libssl.so.0.9.6b
here's an idea to try. we've build the gexec tarball so that you can
easily make an RPM from it. just type
#
yujun wu-
steve's message to you is right on target as far as the security of gmond
and using telnet to get the data.
the reason gmond exports information in XML is because there is an XML
parser available for just about every programing language on the planet.
that makes it easy to build
Today, Michael Dingwall wrote forth saying...
Hey guys,
Thanks for the help. Found out that the owner for the rrds had been
changed to nobody. That really screwed it up. Also, I don't think
that they have be owned by the apache user, because they show up when
owned by the root. So, as
the easiest way is to restart each gmond in your
$GMETAD_ROOT/etc/gmetad_sources file.
-matt
Wednesday, YUEN Shu On wrote forth saying...
Hello,
I want to remove some entries from the metric to
obtain a more clear webpage.
But how to do it?
Today, Sumanth J.V wrote forth saying...
When u choost the following metrics ... hostname, ip, gmond_started
and reported for an individual cluster (not the meta cluster) the
graphs at the end of the page for the indivudual nodes are not loaded.
that's a bug. thanks for the feedback. i'll
henry-
when you say the daemon runs.. does that mean you can access the XML
information? can you do a telnet localhost 8651 on the gmetad box and
send me the output. it might be useful to tracking down what the problem
might be.
thanks!
-matt
Jun 24, Henry Leyh wrote forth saying...
Hi,
Today, Joe Griffin wrote forth saying...
I have ganglia (2.4.1) w/o gmetad working great on an Itanium 2 w/
MSC.Linux distribution. However, I just tried installing gmetad-0.1.1
and gmetad-web-frontend-0.1.0. Now the ganglia page shows no hosts.
Can someone tell me how I must configure
edward-
if you look in request.c on line 50-51 you'll see
h = gethostbyname(ip);
e_assert(h != NULL);
it appears that you are have nameservice problems since gethostbyname() is
failing. are you still having this problem or did you get it fixed?
-matt
Thursday, [EMAIL PROTECTED] wrote forth
May 23, marino vetuschi zuccolini wrote forth saying...
Hello to all
I've a dual frontend with two eth cards. The internal net (eth0) is
10.0.0.* and spans from 1 (the frontend) to 6 (5 dual slaves): the
nodes are called baxeico** (from 00 to 05). Gmond runs on all the
nodes as well
i have created a new mailing list for people who are interested in doing
development on ganglia.
this list, ganglia-general (which btw i wish i would have named
ganglia-users), is an open forum for general questions and tips about how
to use ganglia. the new list, ganglia-developers, will be
i have been so busy working on the core components of ganglia that i have
neglected the web client. there are so many things that can be updated
and enhanced in the client and i just don't have the time to do it.
i have received many patches and ideas from individuals that i just don't
have
...
With the additional caveat that one must stop _all_ participating gmonds.
thanks,
-ryan
On Fri, 26 Apr 2002, matt massie wrote:
ryan-
right now, the only way to do it is..
# /etc/rc.d/init.d/gmond stop
# sleep 90
# /etc/rc.d/init.d/gmond start
-matt
Today, Ryan Sweet wrote forth saying
i THINK i might have a trick that can be helpful for some of your
configurations but i'm not sure. i'm explaining it here in case someone
out there wants to try this out.
on linux, you can enable ip forwarding with a simple command
# echo 1 /proc/sys/net/ipv4/ip_forward
altering this value
i'm sorry that i haven't responded early but i'm in the throws of final
testing and documentation of the execution components of ganglia. i plan
to release them this week.
as far as the trust relationships...
gmond will only store data that it receives via the multicast channel at
this time.
Today, D'Onofrio Florindo wrote forth saying...
Dear Sir,
I'm Florindo D'Onofrio. I'm a student of Computer Science of the
Benevento's Unisannio University (Italy). I have found your e-mail in
the Ganglia Cluster Toolkit v2.1.2, Monitoring Core Documentation. My
teacher has fixed me the
fredrik-
i just added a mute and deaf mode to gmond in order to help you. just
start the webserver gmond with the --mute option. It will listen and
process all multicast traffic but will not multicast it's own state (and
therefore will not show up in the list).
i'll release version 2.1.3
i've had multiple requests from ganglia users for help in generating a
real-time load-balanced MPI machinefile to use for spawning MPI jobs.
to get groups up and running until i have a strong C API, i've posted a
perl script on the ganglia download page which will create a MPI
machinefile on the
wow it workds!
-matt
91 matches
Mail list logo