Re: [Ganglia-developers] reorganizing clusters

Chuck Simmons Thu, 23 Mar 2006 15:52:07 -0800

Alex --

Thanks for the details. Telneting to a gmond XML port to dump internalstate is a nice debugging technique.

One of my problems is that I'm running a secondary daemon using thegmetric subroutine libraries, and it took me awhile to realize thatdaemon is in some ways equivalent to 'gmond'. In particular, I have toreboot it in addition to 'gmond'. The problem was immediately obviousonce I used the telnet trick you mentioned.

So for the missing cpu data issue... Let me write down what's happeningreal slowly to make sure I understand. I'm running a multicast gmond oneach cluster to aggregate data, implying that each node of the clustereventually aggregates data about all other nodes of the same cluster.I'm using a centralized gmetad to pull data from a node of eachcluster. Presumably 'gmetad' doesn't really remember a whole lot aboutthe outlying nodes.I go out to the cluster and kill gmond on each node. Then I gothrough the nodes and start gmond back up on each node. As each nodestarts, it broadcasts number of cpus throughout the cluster. Thus, whenI'm done restarting, one of the nodes (the first to restart) knows howmany cpus each node has, but nodes that were restarted last don't havecomplete state information. When I then restart 'gmetad' at the centrallocation, it connects up to one of the nodes in the cluster, and if thatnode doesn't have full state informatin, gmetad incorrectly reports thenumber of cpus in the cluster. [Since I am using a background processthat gathers metrics separately from 'gmond' relatively frequently, thisbackground process is probably causing all nodes in the cluster to knowabout all of the hosts in the cluster if not all of the metrics of allof the hosts in the cluster.]This will eventually correct itself since all metrics areperiodically rebroadcast.

   Possible alternate fixes may include:

(1) When a node receives a broadcast from another node that ithasn't seen before, it may want to send its data back to the firstnode. If I start node A and it broadcasts to an empty cluster, then Istart node B and it broadcasts to A, then it might be nice if node Asends data back to B because it can reasonably infer that B doesn't haveA's state and that B should have A's state.(2) maybe daemons that gather metrics should not directlybroadcast them throughout a cluster. Instead the metrics should beaccumulated within a central daemon and then be broadcast. (In otherwords, treat 'gmond' as having two separate components: a metricsgathering component and a metric/cluster aggregation component. Thenboth the metrics component of 'gmond' and the metrics that I amgathering should be handed to the aggregation component.) [This isprobably not useful without also implementing (1) above.](3) Alex implies that there may be alternate ways to configurea cluster without using multicasting which may handle some or allaspects of this problem.[We can treat each node as maintaining a list of metrics andtheir current values and broadcasting deltas to that list on a periodicbasis. In the current system, it is possible to recieve a delta withouthaving the background data to which the delta applies. Multiple daemonseach spitting out deltas to their own metrics is compatible with thecurrent model. However, we may want to have all the background data ina single list; we may also want each node to know which metric gatheringdaemons exist so that we can better report when one of the metricgathering daemons dies.]

Moving on to the issue of correcting configuration problems. While wecan say that having a timeout is the way to correct configurationissues, this is not necessarily the best implementation. Part of myproblem is that I have multiple daemons that gather and broadcastmetrics. If we address parts of that as discussed above, then itbecomes easier to fix the broadcast address by just resetting a singledaemon.So, at the current time, we can configure the system in a couple ofways. We can configure the system so that a host is considered removedfrom a cluster when the host has been down sufficiently long, or we canmanually remove the host from the cluster by restarting all gmonddaemons in the cluster.Possible alternate approaches might include providing a command thatcould be sent to a 'gmond' daemon in a cluster to remove a host from thecluster. It may be that there already exist mechanisms to restart allgmond daemons in a cluster, but this mechanism is not integrated intoganglia.

So, thanks, I think I now understand what's going on.

Cheers, Chuck



Alex Balk wrote:

Hi Chuck,


See below...



Chuck Simmons wrote:

The number of cpus does get sorted out, but I don't believe that
restarting 'gmond' is a solution.  The problem occurs after restarting
a number of 'gmond' processes, and the problem is caused because
'gmond' is not reporting the information.  Does 'gmond' maintain a
timestamp on disk as to when it last reported the number of cpus and
insist on waiting sufficiently long to report again?  Does the
collective distributed memory of the system remember when the number
of cpus was last reported but not remember what the last reported
value was?  Is there any chance that anyone can give me hints to how
the code works without me having to read the code and reverse engineer
the intent?


The reporting interval for number of CPUs is defined within /etc/gmond.conf.
For example:

 collection_group {
   collect_once   = yes
   time_threshold = 1800
   metric {
    name = "cpu_num"
   }

The above defines that the number of CPUs is collected once at the
startup of gmond and reported every 1800 seconds.
Your problem occurs because gmond doesn't save any data on disk, but
rather in memory. This means that if you're using a single gmond
aggregator (in unicast mode) and that aggregator gets restarted, it will
will not receive another report the number of CPUs till 1800 seconds
elapsed since the previous report.
The case of multicast is a more interesting one, since every node holds
data for all nodes on the multicast channel. The question here is
whether an update with a newer timestamp overrides all previous XML data
for the host. I don't think that's the case, it seems more likely that
only existing data is overwritten... but then, I don't use multicast, so
you may qualify this answer as throwing useless, obvious crap your way.

Generally speaking, there are 2 cases when a host reports a metric via
its send_channel:
1. When a time_threshold expires.
2. When a value_threshold is exceeded.

You're welcome to read the code for more insight, but a simple telnet to
a predefined TCP channel would probably be quicker. You could just look
at the XML data and compare pre-update and post-update values (yes,
you'll need to take note of the timestamps - again, in the XML).

I understand that I can group nodes via /etc/gmond.conf.  The question
is, once I have screwed up the configuration, how do I recover from
that screw up?  I have restarted various gmetad's and various
gmond's.  The grouping is still incorrect.  Exactly which gmetad's and
gmond's do I have to shut down when.  And, again, my real question is
about understanding how the code works -- how the distributed memory
works.


As far as I know, you cannot recover from a configuration error unless
you've made sure host_dmax was set to a fairly small, non-zero value.

From the docs:

  The host_dmax value is an integer with units in seconds. When set to
  zero (0), gmond will never delete a host from its list even when a
  remote host has stopped responding. If host_dmax is set to a positive
  number then gmond will flush a host after it has not heard from it for
  host_dmax seconds. By the way, dmax means ``delete max''.

This way, once a host's configuration was modified to point at a
different send channel, the aggregator(s) on its previous channel will
forget about its existence once delete_max expires.

Personally, I don't use multicast due to various reasons, the main one
actually being its main advantage - every node keeps data on the entire
cluster. While this provides for maximal high availability, it also has
a bigger memory footprint. Especially when you have a few thousands of
nodes.

I'd much rather be ignored than have people try to pawn off facile
answers on me.


I'd provide you with more information on a possible setup which balances
high availability with performance, but I wouldn't want to overflow you
with useless data any more than I've done so far.
Let me know if you'd like more information.

Cheers,
Alex

Cheers, Chuck



Bernard Li wrote:

Hi Chuck:

For the first issue - give it time, it should sort itself out.Alternatively, you can find out which node is reporting incorrect

information, and restart gmond on it.

For the second issue, you can group nodes in different data_source
via the multicast port in /etc/gmond.conf.  Use the same port # for
nodes you want belonging to the same group.

You'll need to restart gmetad and gmond for the new groupings to take
effect.

Cheers,

Bernard

------------------------------------------------------------------------
*From:* [EMAIL PROTECTED] on behalf of
Chuck Simmons
*Sent:* Wed 22/03/2006 17:54
*To:* [email protected]
*Subject:* [Ganglia-developers] reorganizing clusters

I need help understanding two things.

I currently have a grid.  One of the clusters in the grid is named
"staiu" and the "grid" level web page reports that this has 8 hosts
containing 4 cpus.  In actuality, this has 8 hosts each containing 4
cpus, but apparently the hosts are not reporting the current number of
cpus to the front end.  Why not?  I recently restarted gmond on each of
the 8 hosts.

Another cluster is named "staqp05-08" and the "grid" level web page

reports that this has 12 hosts. In actual fact, it only has 4 hosts.The extra 8 hosts are the 8 hosts of the 'staiu' cluster. On the

cluster level page for staqp05-08, the "choose a node" pull down menu
lists the 8 staiu hosts, and the "hosts up" number contains the staiu
hosts, and there are undrawn graphs for each of the staiu hosts in the
"load one" section.  What do I have to do so that the web pages or gmond
daemons or whatever won't think that the staqp cluster contains the
staiu hosts?  What is the specific mechanism that causes this
association to persist despite having shutdown all staqp gmond daemons
and both the gmond and gmetad daemons on the web server, simultaneously,
and then starting up that collection of daemons?

Thanks, Chuck


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
<http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642>
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] reorganizing clusters

Reply via email to