Re: [Ganglia-developers] hierarchical metric naming (long)

Steven Wagner Tue, 03 Sep 2002 15:37:07 -0700

Federico Sacerdoti wrote:

I like your comments. Here are some counter-questions :)


I like your counter-questions.  Here are some counter-counter-questions,
interspersed with wisecracks like this one.

On Friday, August 30, 2002, at 02:34 PM, Steven Wagner wrote:
It seems to me this would also make the "DSO-ification" of themonitoring core a smoother process, not to mention a cleaner one fromthe standpoint of those developing the DSO's. :)
Good point.

I am particularly interested in laying the groundwork for that so that wecan upgrade the monitoring core here with a vanilla RPM and not worry abouthosing any future proprietary metrics.

So I am thinking that sending the fully-qualified metric name (as shownabove) is a better idea now - it handles failures more effectively. Whena node comes up it would receive metrics that look like"host1/cpu/cache/size" (fully-qualified with all the metric's ancestors)instead of "cache/size" (relative as I had suggested previously). Thisfits in with Steven's idea of hosts being authoritative for branchesthey created - each metric specifies its branches explicitly. It alsoreduces reliance on an elder node for the branch hierarchy.

I side with optimizing the packet size if the choice is between havingsmall packet sizes and small/easy-to-follow data structures in themonitoring core. My rationale for this is that the number of metricsprobably won't scale beyond fifty or a hundred values, if that. I amsomewhat more sensitive to inflating the packet size, which will cause ageometric increase in network traffic as the number of metrics and/or nodesgrow.

The node hostnames here are 12 characters long (and the hostname metricreports their FQDNs). I like the idea of identifying the hosts separatelyusing a metric transmitted sparingly instead of tacking it on to everymetric name the host transmits... although I definitely think it's theright way to go as far as naming schemes go (debug/metadata display), Ibelieve the hostname should be decoupled from the transmitted metric.

And I'm making these points not necessarily because I adore the sublimebeauty of a 128-byte XDR packet versus a 192-byte XDR packet, but... well,actually, I guess that's what I'm doing, actually. :) Part of theelegance of the program is its compactness and simplicity on the networklevel, after all.

Maintaining a separate hash (of (strlen(metricname) + 1 + sizeof(uint32_t))* num_of_metrics) bytes in size) that has XDR metric names as keys and aninternal metric index as values doesn't strike me as being too icky from amemory/performance footprint either. Plus you are only ever doing two hashtable lookups to reach any metric, as opposed to walking an n-branch metrictree (which, if you get right down to it, is probably at least three levelsdeep - hostname, metric-main-type, actual-metric-value (i.e. host1.cpu.num).

All of my comments of course focus on enhancing the flexibility of metricsfrom the multicast side, not the formatting/output of the XML dump.There's no explicit provision mentioned for handling the proposed metrichierarchy.

This way a node can easily create branches as needed for any metric itreceives.
About the "hash for storing fully qualified metric names (FQMN :)". Howwould we populate such a hash? At some level, the metric must specifyits fully-qualified name, so we know where to put it. A hash value is nogood if we don't already have the name stored. How would you handle newmetrics? I think we could runlength-encode the name strings to savespace if we need to, but having each metric carry its full name seemsclearer to me.

At some point a node is going to need each branch that contains a metric tobe described to it in detail. When this description is finished beingparsed and added to the internal metric hash, the monitoring core makes anote of the XDR metric key and adds that to the XDR->internal hash alongwith a pointer to the newly-created entry. This second hash doesn't evenhandle hierarchy - but parent and child(ren) fields should be present inthe entry referenced by the pointer in the lookup hash.

This way, every metric-related message each monitoring core receives will"resolve" to something, and from that you should be able to compute its FQMN.

I imagine a hash_find(node, "cpu", "cache") function that takes avariable number of arguments to locate the hash table to insert a givenmetric in (the metric here: host1/cpu/cache/size). The 'node' argumentspecifies the root of the metric tree - the node hash table for host1.Note each branch would get it's own hash table so that hash_foreach()will work correctly and printing the XML will be easy.
To make this work, we simply add a 'hash_t *branch' member to themetric_data_t structure. If branch==NULL then we are a leaf (actualmetric), else this is a branch that points to another hash table. I canvisualize the XML output code now...

Sounds like we're actually talking about the same thing. I'm justadvocating a different method on the XDR side.


But yeah, the XML output code will not be too tough. :)

Dense, yes, but the area of metrics is just about the only one in theGanglia design that *doesn't* scale well (kudos, Matt & co.). I'msure that we can work this out if we just keep banging those rockstogether. :)
Clever ;)

Just a little homage to the late Douglas Adams. For those of you at homekeeping score, the original line goes something like this:

"... this is the Sub-Etha News Network, broadcasting to civilized beingsthroughout the galaxy. For those of you still living in caves, the secretis to keep banging those rocks together, guys..."

Do people like the java-like dot notation for hierarchical names, like"host1.cpu.cache.size", or the unix filesystem forward-slash notation:"host1/cpu/cache/size". I like the slashes because its easy to tell ifyou're talking about a leaf or a branch: "host1/cpu/" is clearly abranch, while "host1.cpu." is a little harder to read. But either waywould work.

I always liked URLs. Besides, everybody knows what a URL looks like. Atleast, everybody who's running gmond does, I hope ...


gmond://host[.cluster?]/cpu/1/idle_percentage
[etc.]

I'm psyched about this change, and I am ready to dive in right after the2.5.0 release.

Me too, I am itching to write up a platform-agnostic /proc-walking algoI've been kicking around for a few days but I don't want to even get intoit until 2.5.0 gets out the door.

Re: [Ganglia-developers] hierarchical metric naming (long)

Reply via email to