[Ganglia-developers] hierarchical metric naming (long)

Federico Sacerdoti Fri, 30 Aug 2002 12:33:13 -0700

So, as Steven and others have mentioned, we have a problem with gangliametrics. Metrics currently lie in a flat namespace, with no hierarchicalgroupings. I have talked with Matt and Mason (a ganglia developer and myboss) about this problem, and would like to state and define some of ourideas.

Through phone discussions with Matt, I've come to like the idea of a<BRANCH name="CPU"> type tag. This would enable hierarchical namespaces.An important question is why would we want these? There are severalreasons.

Elements in a branch need to share some common semantics. One example ofa shared semantic is all metrics under the CPU branch could getdisplayed in the same section of a webpage. Therefore, if a new cpumetric was introduced, say "L1 cache size", the web page wouldautomatically put it in the correct section.

Another advantage of hierarchies comes from object-oriented design.Attributes in the Branch tag, such as DMAX (when metrics get deleted),become the default for all metrics below it. These can be overrided bythe individual metrics, analogous to overriding baseclass methods in anOO class tree. This gives an easy way to assign attribute values to agroup of metrics.

A third advantage is cleaner namespaces. You can call 'cpu_num' simply'num'. Similar naming simplifications are possible for the othermetrics. The most significant advantage is that we only have to worryabout name collisions among siblings in the tree. There can be a 'num'metric in another branch (for example, the 'num' of network interfaces).

So how do we name metrics in the XDR packet if we adopt a metrichierarchy? This is a difficult problem, since we want to allow newmetrics to appear at any time. Imagine an XDR packet comes in. We needto identify the metric, and update its value in our hash tables.

We know which host sent the packet from its source address, so we lookat our hash table for that host. Now how do we know which branch it goesin? Mason suggested that each XDR metric packet contains the name of itsmost immediate ancestor in the metric tree. The thinking is that sinceorder does not matter among siblings, we could search the tree until wefound the correct branch, and put the metric under it. This wouldrequire the XDR packet to carry a minimal amount of information, and theonly restriction would be each branch must have a unique name.

To update the hash table effectively, each sibling must have a uniqueID. If we get a XDR packet for metric "memory->free", we find the"memory" branch, and put a "free" metric under it. If at a later time wereceive another packet for the same metric, we look for the "free"metric, and update its value. I suggest that like user-defined metrics,we use a string as the hash key. In this case, the string "memory" wouldkey the branch hash table, and "free" would key the metric.

The problem is that the "memory" branch may itself lie along severalbranches. When a host sends us a "memory->free" metric, do we even knowabout the "memory" branch? Let's say we require a host to send a"create-branch" message before sending metrics for that branch. Wellwhat if some nodes miss the create message, or worse yet, they are justrecovering from a crash? They would not know about the "memory" branchat all.

I believe the answer is that new nodes get their branch hierarchy all atonce from the oldest gmond in the cluster (which I will call the eldestnode). Matt has been talking about this for some time, as it will solvesome other problems as well. If we get an XDR metric packet thatspecifies an unknown branch, we discard it. However, we realize that wemust have missed something, so we query the eldest node for their metrichierarchy. If we can't find the eldest node, we query the second eldest,etc. We also query the second eldest if we didn't learn anything newfrom the eldest himself. (This solves the problem of the eldest nodehaving incomplete information).

The assumption is that the eldest node has been listening to all the"create-branch" messages, and has a complete metric tree.

This email message is getting too long, but I would go on about how wecould use the idea of database indexes to quickly locate any branch inthe tree.

I hope I have been relatively clear about these ideas. I realize thisproblem is pretty dense, and this solution is in its infancy. But thepoint I would like to drive home is that a naming hierarchy is helpfulfor specific reasons, and that its efficient implementation is possiblein the ganglia framework.


Sincerely,
Federico

Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92  92BB BA86 B2E6 0390 8845

[Ganglia-developers] hierarchical metric naming (long)

Reply via email to