Utsav,
You are correct. Since the collector gmond is down, when gmetad starts
up again, it will not be able to confirm the existence of that cluster
since the gmond is gone. The reason you can see that cluster now is
because gmetad remembers that the cluster used to exist. When you
restart, it forgets about that cluster until the collector nodes come back.
Ian
Utsav Agarwal wrote:
Ian,
Thanks for pointing the bug-report. Yes, that bug describes exactly
what we are seeing. From the report it looks like the gmond has been
fixed but the Web frontend/Gmetad has not been fixed yet. If we were
to restart the Gmetad, I would assume that we would not be able to see
the clusters (old) data since the collector gmond is down?
Thanks,
Utsav.
------------------------------------------------------------------------
*From:* Ian Cunningham [mailto:[EMAIL PROTECTED]
*Sent:* Monday, December 05, 2005 4:47 PM
*To:* Utsav Agarwal
*Cc:* [email protected]
*Subject:* Re: [Ganglia-general] gmetad/php bug?
Utsav,
This may be covered in this bug:
http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=39
If you don't want to see that cluster anymore, you could restart your
gmetads. But then you lose access to your old data from the web.
Ian
Utsav Agarwal wrote:
Old structure (worked well):
----------------------------------------
Node A (unicast)à gmond collector node (Node B) ßgmetad poll (Node C)
<--summary gmetad poll (Node D)
What's changed:
-------------------------
Nodes A and B are part of a cluster that doesn't exist anymore i.e.
the cluster has been retired. Node A and Node B do not exist anymore.
What's working/not-working:
-----------------------------------------
The summary Gmetad on node D reports nodeA as down (correct).
The Gmetad process on Node C shows Node A as up on summary page
(wrong). However, it does report the last date of update as 1 week ago
(nodes have been down since then).
The Gmetad process on Node C shows Node A as down at the node level
(correct).
Any explanation will be helpful.
Example output (for Node A and another node that is up) from telnet on
Node C:
-----------------------------------------------------------------------------------------------------------------------
<HOST NAME="Node A" IP="a.b.c.d" REPORTED="1132685684" TN="1134644"
TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1131061840">
<HOST NAME="Working Node" IP="w.x.y.z" REPORTED="1133820125" TN="212"
TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1130851765">
Thanks,
Utsav.
------------------------------------------------------------------------------------
Utsav Agarwal
Systems Analyst
eXcellence in IS Solutions (X-ISS)
------------------------------------------------------------------------------------