Re: [Ganglia-developers] hierarchical metric naming (long)

Steven Wagner Thu, 05 Sep 2002 18:30:22 -0700

Federico David Sacerdoti wrote:

One more email on this subject before we hit the matresses for the newrelease.


I pity the mattresses.

So I think I am convinced that fixed sized names (expressed with hashes) are agood idea, provided we are careful and efficient with them.


Woohoo!

Each metric would carry a hash value as its name. Short, compact, easy toprocess, and expressive if used with an internal mapping table(hash->"fully-qualified metric name").

Wouldn't we save a lookup with "hash->pointer_to_main_metric_tree" and keepall the 'real' info in the big tree? Otherwise we check the map and thenhave to walk the metric tree (which, now that it's dynamic, could be verybig or very small) every time we process a metric.

This means we need separate "branch" messages that create a branch. This is agood thing also because it allows us to specify attributes for a branch(which will be inherited by its children).

I still think there should be a way for a node to complain that it doesn'tknow a metric or branch and get a branch message retransmitted. Even ifthe "unknown metric" message is a short packet with the name of the unknownmetric and a magic number, and that's it.

I think that some branches should be "well-known" by all nodes. These canhouse standard metrics that ship with ganglia. These branches do not need tobe explicitly described. This mechanism gives a nice way to bootstrap themetric tree and reduces the number of "branch" messages, especially in thecommon case where there are no user-defined ones.

Yes, but only *most* of our currently-implemented metrics can truly beconsidered "well-known," IMO (if "well-known" means that it's a metricsupported by the monitoring core on *all* platforms).

Therefore only custom branches get sent during the send_all_metric_data() callin listen.c. This function is used to send all local metrics when a new gmondis discovered.


I love that function.

Finally, I suggest that we make the name sent in an XDR packet a MD5 hash ofthe fully-qualified metric name. This 160bit hash is not too long, and sincewe do not know the names of user-defined branches a priori, the MD5 hashinsures there will be no collisions. It is theorized that the MD5 algorithmyields a unique 160bit value for all possible strings.

I was actually going to suggest this but didn't want it shot down asoverkill. :P

I think the DNS-like protocol is too much. Basically you're right:

But the problem's still there. Even though in all likelihood there's aone-in-a-million chance of a node missing the create-branch message,extensive perusal of Terry Pratchett novels shows that one-in-a-millioncircumstances come up ALL THE TIME.

Someone (one of the n-1 nodes) will know about the branch and multicast it.

But if every node is identical, won't each request get n-1 responses, or dowe put an "if (i_am_the_eldest_node())" statement in there? :)

Anyway, let's say that doesn't work, or your six-fig Cisco monkey shoved a
banana in a switch somewhere and the "create-branch" message arrives after
the metric itself.

At this point we have two options:

*  Discard the metric data, process the create-branch data, wait for the
next metric transmission.  Straightforward but it means a hole in the data
for up to t_max and that'd be a bummer if it's one of those 15-minute
metrics.

*  Guess at adding the metric data based on the payload type of the XDR.
If you win and we have a string in there at least naming the actual metric,
then we sock it into an "uncategorized" branch and query/wait for the
branch data.  After the create-branch data is received, we update the
lookup hash and the metric hash to move the guessed metric into its
rightful place.  This is quite a bit more complicated, obviously.  And we
can't report this metric until its rightful place is secured.



We'll have to think about this case some more. You made good points.

I lean towards the former option. Not only is it simpler to code, but itmakes more sense, and out-of-hierarchy metrics ... man, that wouldn't be good.


Anyway back to gmetad...

Re: [Ganglia-developers] hierarchical metric naming (long)

Reply via email to