Utsav,

You are correct. Since the collector gmond is down, when gmetad starts up again, it will not be able to confirm the existence of that cluster since the gmond is gone. The reason you can see that cluster now is because gmetad remembers that the cluster used to exist. When you restart, it forgets about that cluster until the collector nodes come back.

Ian

Utsav Agarwal wrote:

Ian,

Thanks for pointing the bug-report. Yes, that bug describes exactly what we are seeing. From the report it looks like the gmond has been fixed but the Web frontend/Gmetad has not been fixed yet. If we were to restart the Gmetad, I would assume that we would not be able to see the clusters (old) data since the collector gmond is down?

Thanks,

Utsav.

------------------------------------------------------------------------

*From:* Ian Cunningham [mailto:[EMAIL PROTECTED]
*Sent:* Monday, December 05, 2005 4:47 PM
*To:* Utsav Agarwal
*Cc:* [email protected]
*Subject:* Re: [Ganglia-general] gmetad/php bug?

Utsav,

This may be covered in this bug: http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=39

If you don't want to see that cluster anymore, you could restart your gmetads. But then you lose access to your old data from the web.

Ian

Utsav Agarwal wrote:

Old structure (worked well):

----------------------------------------

Node A (unicast)à gmond collector node (Node B) ßgmetad poll (Node C) <--summary gmetad poll (Node D)

What's changed:

-------------------------

Nodes A and B are part of a cluster that doesn't exist anymore i.e. the cluster has been retired. Node A and Node B do not exist anymore.

What's working/not-working:

-----------------------------------------

The summary Gmetad on node D reports nodeA as down (correct).

The Gmetad process on Node C shows Node A as up on summary page (wrong). However, it does report the last date of update as 1 week ago (nodes have been down since then).

The Gmetad process on Node C shows Node A as down at the node level (correct).

Any explanation will be helpful.

Example output (for Node A and another node that is up) from telnet on Node C:

-----------------------------------------------------------------------------------------------------------------------

<HOST NAME="Node A" IP="a.b.c.d" REPORTED="1132685684" TN="1134644" TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1131061840">

<HOST NAME="Working Node" IP="w.x.y.z" REPORTED="1133820125" TN="212" TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1130851765">

Thanks,

Utsav.

------------------------------------------------------------------------------------

Utsav Agarwal

Systems Analyst

eXcellence in IS Solutions (X-ISS)

------------------------------------------------------------------------------------

Reply via email to