Alex --
Thanks for the details. Telneting to a gmond XML port to dump internal
state is a nice debugging technique.
One of my problems is that I'm running a secondary daemon using the
gmetric subroutine libraries, and it took me awhile to realize that
daemon is in some ways equivalent to 'gmond'. In particular, I have to
reboot it in addition to 'gmond'. The problem was immediately obvious
once I used the telnet trick you mentioned.
So for the missing cpu data issue... Let me write down what's happening
real slowly to make sure I understand. I'm running a multicast gmond on
each cluster to aggregate data, implying that each node of the cluster
eventually aggregates data about all other nodes of the same cluster.
I'm using a centralized gmetad to pull data from a node of each
cluster. Presumably 'gmetad' doesn't really remember a whole lot about
the outlying nodes.
I go out to the cluster and kill gmond on each node. Then I go
through the nodes and start gmond back up on each node. As each node
starts, it broadcasts number of cpus throughout the cluster. Thus, when
I'm done restarting, one of the nodes (the first to restart) knows how
many cpus each node has, but nodes that were restarted last don't have
complete state information. When I then restart 'gmetad' at the central
location, it connects up to one of the nodes in the cluster, and if that
node doesn't have full state informatin, gmetad incorrectly reports the
number of cpus in the cluster. [Since I am using a background process
that gathers metrics separately from 'gmond' relatively frequently, this
background process is probably causing all nodes in the cluster to know
about all of the hosts in the cluster if not all of the metrics of all
of the hosts in the cluster.]
This will eventually correct itself since all metrics are
periodically rebroadcast.
Possible alternate fixes may include:
(1) When a node receives a broadcast from another node that it
hasn't seen before, it may want to send its data back to the first
node. If I start node A and it broadcasts to an empty cluster, then I
start node B and it broadcasts to A, then it might be nice if node A
sends data back to B because it can reasonably infer that B doesn't have
A's state and that B should have A's state.
(2) maybe daemons that gather metrics should not directly
broadcast them throughout a cluster. Instead the metrics should be
accumulated within a central daemon and then be broadcast. (In other
words, treat 'gmond' as having two separate components: a metrics
gathering component and a metric/cluster aggregation component. Then
both the metrics component of 'gmond' and the metrics that I am
gathering should be handed to the aggregation component.) [This is
probably not useful without also implementing (1) above.]
(3) Alex implies that there may be alternate ways to configure
a cluster without using multicasting which may handle some or all
aspects of this problem.
[We can treat each node as maintaining a list of metrics and
their current values and broadcasting deltas to that list on a periodic
basis. In the current system, it is possible to recieve a delta without
having the background data to which the delta applies. Multiple daemons
each spitting out deltas to their own metrics is compatible with the
current model. However, we may want to have all the background data in
a single list; we may also want each node to know which metric gathering
daemons exist so that we can better report when one of the metric
gathering daemons dies.]
Moving on to the issue of correcting configuration problems. While we
can say that having a timeout is the way to correct configuration
issues, this is not necessarily the best implementation. Part of my
problem is that I have multiple daemons that gather and broadcast
metrics. If we address parts of that as discussed above, then it
becomes easier to fix the broadcast address by just resetting a single
daemon.
So, at the current time, we can configure the system in a couple of
ways. We can configure the system so that a host is considered removed
from a cluster when the host has been down sufficiently long, or we can
manually remove the host from the cluster by restarting all gmond
daemons in the cluster.
Possible alternate approaches might include providing a command that
could be sent to a 'gmond' daemon in a cluster to remove a host from the
cluster. It may be that there already exist mechanisms to restart all
gmond daemons in a cluster, but this mechanism is not integrated into
ganglia.
So, thanks, I think I now understand what's going on.
Cheers, Chuck
Alex Balk wrote:
Hi Chuck,
See below...
Chuck Simmons wrote:
The number of cpus does get sorted out, but I don't believe that
restarting 'gmond' is a solution. The problem occurs after restarting
a number of 'gmond' processes, and the problem is caused because
'gmond' is not reporting the information. Does 'gmond' maintain a
timestamp on disk as to when it last reported the number of cpus and
insist on waiting sufficiently long to report again? Does the
collective distributed memory of the system remember when the number
of cpus was last reported but not remember what the last reported
value was? Is there any chance that anyone can give me hints to how
the code works without me having to read the code and reverse engineer
the intent?
The reporting interval for number of CPUs is defined within /etc/gmond.conf.
For example:
collection_group {
collect_once = yes
time_threshold = 1800
metric {
name = "cpu_num"
}
The above defines that the number of CPUs is collected once at the
startup of gmond and reported every 1800 seconds.
Your problem occurs because gmond doesn't save any data on disk, but
rather in memory. This means that if you're using a single gmond
aggregator (in unicast mode) and that aggregator gets restarted, it will
will not receive another report the number of CPUs till 1800 seconds
elapsed since the previous report.
The case of multicast is a more interesting one, since every node holds
data for all nodes on the multicast channel. The question here is
whether an update with a newer timestamp overrides all previous XML data
for the host. I don't think that's the case, it seems more likely that
only existing data is overwritten... but then, I don't use multicast, so
you may qualify this answer as throwing useless, obvious crap your way.
Generally speaking, there are 2 cases when a host reports a metric via
its send_channel:
1. When a time_threshold expires.
2. When a value_threshold is exceeded.
You're welcome to read the code for more insight, but a simple telnet to
a predefined TCP channel would probably be quicker. You could just look
at the XML data and compare pre-update and post-update values (yes,
you'll need to take note of the timestamps - again, in the XML).
I understand that I can group nodes via /etc/gmond.conf. The question
is, once I have screwed up the configuration, how do I recover from
that screw up? I have restarted various gmetad's and various
gmond's. The grouping is still incorrect. Exactly which gmetad's and
gmond's do I have to shut down when. And, again, my real question is
about understanding how the code works -- how the distributed memory
works.
As far as I know, you cannot recover from a configuration error unless
you've made sure host_dmax was set to a fairly small, non-zero value.
From the docs:
The host_dmax value is an integer with units in seconds. When set to
zero (0), gmond will never delete a host from its list even when a
remote host has stopped responding. If host_dmax is set to a positive
number then gmond will flush a host after it has not heard from it for
host_dmax seconds. By the way, dmax means ``delete max''.
This way, once a host's configuration was modified to point at a
different send channel, the aggregator(s) on its previous channel will
forget about its existence once delete_max expires.
Personally, I don't use multicast due to various reasons, the main one
actually being its main advantage - every node keeps data on the entire
cluster. While this provides for maximal high availability, it also has
a bigger memory footprint. Especially when you have a few thousands of
nodes.
I'd much rather be ignored than have people try to pawn off facile
answers on me.
I'd provide you with more information on a possible setup which balances
high availability with performance, but I wouldn't want to overflow you
with useless data any more than I've done so far.
Let me know if you'd like more information.
Cheers,
Alex
Cheers, Chuck
Bernard Li wrote:
Hi Chuck:
For the first issue - give it time, it should sort itself out.
Alternatively, you can find out which node is reporting incorrect
information, and restart gmond on it.
For the second issue, you can group nodes in different data_source
via the multicast port in /etc/gmond.conf. Use the same port # for
nodes you want belonging to the same group.
You'll need to restart gmetad and gmond for the new groupings to take
effect.
Cheers,
Bernard
------------------------------------------------------------------------
*From:* [EMAIL PROTECTED] on behalf of
Chuck Simmons
*Sent:* Wed 22/03/2006 17:54
*To:* [email protected]
*Subject:* [Ganglia-developers] reorganizing clusters
I need help understanding two things.
I currently have a grid. One of the clusters in the grid is named
"staiu" and the "grid" level web page reports that this has 8 hosts
containing 4 cpus. In actuality, this has 8 hosts each containing 4
cpus, but apparently the hosts are not reporting the current number of
cpus to the front end. Why not? I recently restarted gmond on each of
the 8 hosts.
Another cluster is named "staqp05-08" and the "grid" level web page
reports that this has 12 hosts. In actual fact, it only has 4 hosts.
The extra 8 hosts are the 8 hosts of the 'staiu' cluster. On the
cluster level page for staqp05-08, the "choose a node" pull down menu
lists the 8 staiu hosts, and the "hosts up" number contains the staiu
hosts, and there are undrawn graphs for each of the staiu hosts in the
"load one" section. What do I have to do so that the web pages or gmond
daemons or whatever won't think that the staqp cluster contains the
staiu hosts? What is the specific mechanism that causes this
association to persist despite having shutdown all staqp gmond daemons
and both the gmond and gmetad daemons on the web server, simultaneously,
and then starting up that collection of daemons?
Thanks, Chuck
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
<http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642>
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers