Jason,
 
I agree with Jason , but I would do it even simpler. While you are
becoming familar with ganglia it (in my opinion) is a lot
simpler to configure the gmond's to use unicast to 1 member of the
cluster that will become
your headnode.
 
If you do that and you want to flush incorrect information out, then you
only have to restart
the headnode gmond daemon. Also, in the gmond.conf's you can set:
 
globals {
  setuid = no
  user = nobody
  cleanup_threshold = 200 /*secs */
  host_dmax = 172800 /* 2 days before dead hosts disappear from the web
site */
}
 
host_dmax is not in the default configurations I believe, and you can
set it short while playing.
 
Also, can I suggest you get and compile netcat (nc) if you do not
already have it. Using netcat
instead of telnet gives you greater flexibility on redirecting output
and other stuff.
 
For example, I have a dead simple script that reports dead hosts across
the many many clusters I have:
 
#!/usr/bin/perl
open (FD, "nc ganglia-pilot 8651 |") or die "netcat";
while (<FD>) {
        ($cluster) = / NAME="([^"]*)" / if (/^<CLUSTER NAME/);
        $host="";
        ($host,$TN,$TMAX) = / NAME="([^"]*)"
.*TN="([^"]*)".*TMAX="([^"]*)"/ if (/^<HOST NAME/);
        if ($host) {
                printf("$host ($cluster) has not reported for %.1f hours
(TN=$TN TMAX=$TMAX)\n", $TN/(60*60))
                        if ($TN > 10*$TMAX);
        }
}
 
yielding for example:
ldnpsm020001743.intranet.barcapint.com (QANET UAT) has not reported for
508.1 hours (TN=1829052 TMAX=20)
ldnpsm02fb00587.intranet.barcapint.com (EDT QANetOffsys6) has not
reported for 3.9 hours (TN=14217 TMAX=20)
ldnpsm02fa00144.intranet.barcapint.com (EDT QANetOffsys6) has not
reported for 2.6 hours (TN=9272 TMAX=20)
ldnpsm02fa00135.intranet.barcapint.com (EDT QANetOffsys6) has not
reported for 2.9 hours (TN=10414 TMAX=20)
ldnpsm020001338.intranet.barcapint.com (LDN FIP QA Qasys PDN) has not
reported for 0.2 hours (TN=692 TMAX=20)
ldnpsm020002397.intranet.barcapint.com (LDN FIP QA USD Exotics PDN) has
not reported for 25.1 hours (TN=90472 TMAX=20)
ldnpsm02fb00424.intranet.barcapint.com (EDT QANetOffsys5) has not
reported for 3.4 hours (TN=12151 TMAX=20)

Easy!
 
kind regards,
Richard


------------------------------------------------------------------------
For more information about Barclays Capital, please
visit our web site at http://www.barcap.com.


Internet communications are not secure and therefore the Barclays 
Group does not accept legal responsibility for the contents of this 
message.  Although the Barclays Group operates anti-virus programmes, 
it does not accept responsibility for any damage whatsoever that is 
caused by viruses being passed.  Any views or opinions presented are 
solely those of the author and do not necessarily represent those of the 
Barclays Group.  Replies to this email may be monitored by the Barclays 
Group for operational or business reasons.

------------------------------------------------------------------------

Reply via email to