Thanks Matt... I don't really understand the issue myself, being a UNIX
person and not a network person. They spoke of routers and turning on
multicast routing and new tables and the inability of certain classes of
addresses receiving multicast traffic.
But basically they are now not unhappy with me. What you say makes sense
to me too, so I don't really understand their issue.
The real benefit to me, however, is that I only need to restart the
gmond daemon on the two listener machines when a node is removed from
the cluster (for repair or retirement) rather than the gmond daemons on
all the nodes as I had to do when they all listened.
Thanks,
Paul
Matt Massie wrote:
paul-
i'm a little confused here. if you run all your cluster hosts in "deaf"
mode except for two hosts, then the amount of multicast traffic would
not change.
with your current configuration every host is still multicasting on the
channel (since they are not "mute") but only two hosts are listening and
saving the data on the multicast channel.
the only thing you save with your configuration is some memory on each
of your cluster nodes (since they are not storing the data).
a 160 node cluster should only use about 65 kbs of network bandwidth.
what they might have been seeing is a spike that can occur when lots of
hosts crash and then rejoin the group or your reboot the entire
cluster. traffic increases for able five minutes or so as each node
syncs with the rest of the cluster.
if you are really having problems with multicast, you might try the
2.6.0 beta which supports unicast UDP. in that case, all nodes in a
cluster direct their messages directly to one (or a few) hosts... which
is what i think you are trying to accomplish.
-matt
On Fri, 2004-06-04 at 10:51, Paul Henderson wrote:
We don't really consider the ganglia monitoring critical, so we can
sustain what would be an exceedingly rare failure like both nodes going
down.
The real reason I did this was to reduce multicast traffic... the
network guys were getting blue in the face talking about how 160 nodes
were all broadcasting and listening at the same time. I don't understand
the actual mechanics, but they are now happy (or should I say
"marginally happier"... they never seem to really be happy ;-)
Paul
Princeton Plasma Physics Lab
Bernard Li wrote:
Hey Paul:
But I guess if the odd chance of both the two nodes going down, then
your history will be lost...
Of course if you are using Ganglia on a large cluster, you probably
don't want every node to be sending packets to each other ;-)
Cheers,
Bernard
-----Original Message-----
From: Paul Henderson [mailto:[EMAIL PROTECTED]
Sent: Friday, June 04, 2004 10:38
To: Johnston Michael J Contr AFRL/DES
Cc: Bernard Li; [email protected]
Subject: Re: [Ganglia-general] All my nodes listed as clusters
What I've been doing is running the gmond on all my cluster
nodes, but making all but 2 of my 160 nodes "deaf" (see
gmond.conf). All the nodes then multicast their information,
but only two hold the data, the other nodes just broadcast
but don't hold any data.
This is *really* useful, because if one node dies or is
moved, then you don't have to restart gmond on every single
node to get it to 'forget'
the node... you just need to do it on the two listening
nodes. Also, network traffic is significantly reduced.
Paul
Princeton Plasma Physics Lab
Johnston Michael J Contr AFRL/DES wrote:
Thanks for the response Bernard!
I guess I didn't think that I could only put 1 node in the
data_source
line because how does it know to go and collect the
information from
the other nodes? Does it just scan the subnet looking for
any machine
running gmond? Every one of my nodes has the exact same gmond.conf
file on it with the name of my cluster in it. Is that how it knows?
Thanks for asking about the graphs... Thanks to everyone's
pointers, I
learned that I had listed the path to the RRDtool directory, but
hadn't put the executable name into the path. After I
changed that it
all started working... ;) Ganglia is really awesome!
Mike
----------------------------------------------------------------------
--
*From:* Bernard Li [mailto:[EMAIL PROTECTED]
*Sent:* Friday, June 04, 2004 11:18 AM
*To:* Johnston Michael J Contr AFRL/DES;
[email protected]
*Subject:* RE: [Ganglia-general] All my nodes listed as clusters
If you only have one cluster, you only need one data_source
(think of
the data_source as the headnode of your cluster, if you will).
So you just need one entry for data_source - you can put
more than one
node in the data_source entry for redundancy purposes.
So I take it you can see your graph now and the previous thread you
posted is dead?
Cheers,
Bernard
----------------------------------------------------------------------
--
*From:* [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
*On Behalf Of
*Johnston Michael J Contr AFRL/DES
*Sent:* Friday, June 04, 2004 8:37
*To:* [email protected]
*Subject:* [Ganglia-general] All my nodes listed as clusters
I have a silly question, as usual...
When I bring up the view of my cluster, it comes up as
a Grid... so
it .looks like this:
Grid > MyCluster > Choose a Node
I'm guessing that's because in my gmetad.conf file I have every
node in my cluster listed as:
data_source "N1" 60 192.168.3.2:8649
data_source "N2" 60 192.168.3.3:8649
I'm sure that I'm listing them wrong because Ganglia thinks that
each node is its own cluster. My question is how do I make them
appear like one unit as I see in the demo pages? Do I
add them all
to one data_source line?
On a side question, is it normal for my head node to
always be in
the red? It looks like it's only using about 8% CPU, but it's
always red or orange.
-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general
--
Paul Henderson
UNIX Systems Engineering Group
Princeton Plasma Physics Laboratory
Princeton, NJ 08543
(609) 243-2412