Re: [Ganglia-developers] One more question

Federico Sacerdoti Fri, 06 Dec 2002 10:20:43 -0800

I'll try to answer all of these.

On Thursday, December 5, 2002, at 09:23 PM, [EMAIL PROTECTED]wrote:

Frederico & Steven, I really apprceciate your thoughts about theGanglia front-end architecture.
I have one more question. Is gmetad robust? If I've got this right,gmond maintains only the lastest metric values received for thecluster. If all of the gmetads go down, aren't all the values duringthat time period lost forever?

If a gmetad goes down, it stops recording metric value history. When itcomes up, this will show as a gap in the graphs.

If at least one gmetad stays up, then when others come up and pull thexml description from the gmetad that survived, will they merge all ofthe values missing from their own rrd?

This does not happen. Gmetad's are not robust the way gmonds are. Theydo not attempt to "bring newcomers up to date" as gmond does. This hasto do with security: how do we know you deserve the old data? Withgmond, the security is implicit in being part of the multicast channel.

Also, the rrds are very timestamp sensitive. Even if we did give arecovering gmetad data for its gaps, small clock skews would make thegraphs look terrible. Not that this isn't something we could overcomewith careful engineering, however. Our assumption is that gmetads arerunning on dedicated monitoring hardware that is hand administered andpossibly redundant. If a gmetad goes down, an operator can copy the rrdfiles from a surviving gmetad to fill in the gaps. However in practice,gaps are not that big of a deal, and don't degrade performance orcorrectness like a gmond failure does.

If so, how do you know at any given time whether a particular gmetadis up to date?

A gmetad always makes graphs based on fresh data. If it is drawinganything on the left side of a graph, it is up to date. Otherwise it isdead. If there are gaps in the graph, it means the gmetad was down forthat period of history. I may be misunderstanding your question here.

What advice would you give in terms of the gmetad to gmond ratio? Formaximum redunancy, should every node run both gmond and gmetad?

Since keeping metric history with RRD databases is computation and I/Ointensive, I would not suggest this. We keep a gmetad service runningon the frontend node of a cluster, that is one gmetad for the cluster.


Jonathan


Hope this helps,
Federico

Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92  92BB BA86 B2E6 0390 8845

Re: [Ganglia-developers] One more question

Reply via email to