Federico David Sacerdoti wrote:
One more email on this subject before we hit the matresses for the new release.

I pity the mattresses.

So I think I am convinced that fixed sized names (expressed with hashes) are a good idea, provided we are careful and efficient with them.

Woohoo!

Each metric would carry a hash value as its name. Short, compact, easy to process, and expressive if used with an internal mapping table (hash->"fully-qualified metric name").

Wouldn't we save a lookup with "hash->pointer_to_main_metric_tree" and keep all the 'real' info in the big tree? Otherwise we check the map and then have to walk the metric tree (which, now that it's dynamic, could be very big or very small) every time we process a metric.

This means we need separate "branch" messages that create a branch. This is a good thing also because it allows us to specify attributes for a branch (which will be inherited by its children).

I still think there should be a way for a node to complain that it doesn't know a metric or branch and get a branch message retransmitted. Even if the "unknown metric" message is a short packet with the name of the unknown metric and a magic number, and that's it.

I think that some branches should be "well-known" by all nodes. These can house standard metrics that ship with ganglia. These branches do not need to be explicitly described. This mechanism gives a nice way to bootstrap the metric tree and reduces the number of "branch" messages, especially in the common case where there are no user-defined ones.

Yes, but only *most* of our currently-implemented metrics can truly be considered "well-known," IMO (if "well-known" means that it's a metric supported by the monitoring core on *all* platforms).

Therefore only custom branches get sent during the send_all_metric_data() call in listen.c. This function is used to send all local metrics when a new gmond is discovered.

I love that function.

Finally, I suggest that we make the name sent in an XDR packet a MD5 hash of the fully-qualified metric name. This 160bit hash is not too long, and since we do not know the names of user-defined branches a priori, the MD5 hash insures there will be no collisions. It is theorized that the MD5 algorithm yields a unique 160bit value for all possible strings.

I was actually going to suggest this but didn't want it shot down as overkill. :P

I think the DNS-like protocol is too much. Basically you're right:

But the problem's still there. Even though in all likelihood there's a one-in-a-million chance of a node missing the create-branch message, extensive perusal of Terry Pratchett novels shows that one-in-a-million circumstances come up ALL THE TIME.

Someone (one of the n-1 nodes) will know about the branch and multicast it.

But if every node is identical, won't each request get n-1 responses, or do we put an "if (i_am_the_eldest_node())" statement in there? :)

Anyway, let's say that doesn't work, or your six-fig Cisco monkey shoved a
banana in a switch somewhere and the "create-branch" message arrives after
the metric itself.

At this point we have two options:

*  Discard the metric data, process the create-branch data, wait for the
next metric transmission.  Straightforward but it means a hole in the data
for up to t_max and that'd be a bummer if it's one of those 15-minute
metrics.

*  Guess at adding the metric data based on the payload type of the XDR.
If you win and we have a string in there at least naming the actual metric,
then we sock it into an "uncategorized" branch and query/wait for the
branch data.  After the create-branch data is received, we update the
lookup hash and the metric hash to move the guessed metric into its
rightful place.  This is quite a bit more complicated, obviously.  And we
can't report this metric until its rightful place is secured.


We'll have to think about this case some more. You made good points.

I lean towards the former option. Not only is it simpler to code, but it makes more sense, and out-of-hierarchy metrics ... man, that wouldn't be good.

Anyway back to gmetad...


Reply via email to