matt massie wrote:
Today, Federico Sacerdoti wrote forth saying...


I dont understand. Floats and doubles often will be generated in binary.
For example a detailed running time from MPI_Wtime() or a GPS
coordinate. No matter how you slice it, storing it in ascii means a
conversion. That conversion looses precision. We could store the
exponent and mantissa as separate integers in ascii, but that is a
different strategy with its own problems.


it's true that many of the system/library calls we use to collect metrics
return floats/doubles in binary form.  even if we send/save the data in
binary form we still have to convert it once we want to output xml (since
xml is text).  if we use ascii on the wire then each node is responsible
to convert its binary to ascii instead of having a single machine convert
every value for every host on the fly.  the only difference is where the
conversion is happening... not if it happens or not.  it makes more since
to distribute the conversion since it will make gmond answer MUCH faster. your benchmarks on gmond demonstrated that gmond could only provide about
6 samples/sec for a cluster of 100 nodes.  that's largely because of the
thousands of binary conversions that it needs to make.  if everything in
memory is ascii.. no conversions are necessary.

So... assuming we use the XDR-push-to-metadaemon model, what if the metadaemon's data source thread does the binary-to-ascii conversions before storing the data? It's after the data has been captured and time-stamped, so at that point it's no longer time-sensitive (since you're not relying on a data-storage-subsystem-generated timestamp) - if it takes ten seconds to convert all them thar reals to chars, it's more acceptable than hitting a monitoring core with the same task.

Would it be feasible to add another metric attribute for floats that governs precision? Maybe it just needs to be part of the metric collection code - supply a hook for a conversion function in the ganglia library, but make sure that devs know that they can't return floats (or perhaps just "caveat coder").

keep in mind too that on linux everything is ascii thanks to /proc.

Not everyone uses Linux, you insensitive clod! :P

I can't tell you how annoyed I was to find that all the cool metrics were so easy to collect on Linux.

i really don't like the CLUSTER and GRID tags.  are a federation of
clusters really a Grid?  is a federation of grids still a Grid?  i'm sure
that over time the vocabulary will change. what about planetary-scale systems or network overlays.. i mean.. it's just vocabulary and i want the ganglia admins to define that vocabulary for themselves.

I always liked the term "Empire."

as far as the shorthand delimiters, i'm just looking for a simple way to translate a block of xml to a filesytem like format. this is important for the location of round-robin databases (or other database names for that matter). it would be trivial to take

the tree library is completely hash table based. there are no linked lists. i wrote it that way intensionally to make it fast for insert/reads/lookups etc. the only problem with using hash tables is that you have to have a unique "key" at each level of the tree path.

Excellent.  That means you can store all kinds of data at the lowest level.

Such as a RRD path, which you compute only once, if the value doesn't exist already in the hash for that metric. You can even make this an inheritable value - when you create a new branch of the tree, you take the higher level's metric_path value and slap another delimiter and the current branch name on it. Once you get down to a leaf, you add "metric_name.rrd" and that's it.

By adding the attribute everywhere, you cover summary data as well.

Of course, that does significantly increase the number of table lookups. Unless you break off the attribute table into a separate hash with name values keyed to the value pointers of each level of the hash. You still use more memory but probably not more than 100k for a very large cluster, I should think.

say i wanted to do a tree_foreach on the tree it could be something like this...

<processes ...>
  <1034 ...>
     <cpu ...>
       <user .../>
     </cpu>
  </1034>
</processes>

Either it's too early in the morning or I need to go find a monolith to help me understand this (or a babelfish?). Maybe you could add the attributes you needed to a hash table like the above.

for example. :) but we'd need to delimit the DNS portion somehow because say...

ORG:RocksClusters:Meta:San Diego Grid:SDSC Rocks Grid:Meteor::computer-0-2

doesn't work... only part of the route (Meta.RocksClusters.ORG) points to the machine with the data.

Delimit the DNS name with dots, maybe?

ps.  have you guys seen the latest pirate movie?


pps.  it's rated "Aaaarrrrrr"

Pirate movies.  Don't talk to me about pirate movies.


Reply via email to