Re: [Ganglia-developers] g3 (really long)

Steven Wagner Mon, 31 Mar 2003 10:43:31 -0800

matt massie wrote:

Today, Federico Sacerdoti wrote forth saying...

I dont understand. Floats and doubles often will be generated in binary.
For example a detailed running time from MPI_Wtime() or a GPS
coordinate. No matter how you slice it, storing it in ascii means a
conversion. That conversion looses precision. We could store the
exponent and mantissa as separate integers in ascii, but that is a
different strategy with its own problems.



it's true that many of the system/library calls we use to collect metrics
return floats/doubles in binary form.  even if we send/save the data in
binary form we still have to convert it once we want to output xml (since
xml is text).  if we use ascii on the wire then each node is responsible
to convert its binary to ascii instead of having a single machine convert
every value for every host on the fly.  the only difference is where the
conversion is happening... not if it happens or not.  it makes more since

to distribute the conversion since it will make gmond answer MUCH faster.your benchmarks on gmond demonstrated that gmond could only provide about

6 samples/sec for a cluster of 100 nodes.  that's largely because of the
thousands of binary conversions that it needs to make.  if everything in
memory is ascii.. no conversions are necessary.

So... assuming we use the XDR-push-to-metadaemon model, what if themetadaemon's data source thread does the binary-to-ascii conversions beforestoring the data? It's after the data has been captured and time-stamped,so at that point it's no longer time-sensitive (since you're not relying ona data-storage-subsystem-generated timestamp) - if it takes ten seconds toconvert all them thar reals to chars, it's more acceptable than hitting amonitoring core with the same task.

Would it be feasible to add another metric attribute for floats thatgoverns precision? Maybe it just needs to be part of the metric collectioncode - supply a hook for a conversion function in the ganglia library, butmake sure that devs know that they can't return floats (or perhaps just"caveat coder").

keep in mind too that on linux everything is ascii thanks to /proc.


Not everyone uses Linux, you insensitive clod! :P

I can't tell you how annoyed I was to find that all the cool metrics wereso easy to collect on Linux.

i really don't like the CLUSTER and GRID tags.  are a federation of
clusters really a Grid?  is a federation of grids still a Grid?  i'm sure
that over time the vocabulary will change. what about planetary-scalesystems or network overlays.. i mean.. it's just vocabulary and i want theganglia admins to define that vocabulary for themselves.


I always liked the term "Empire."

as far as the shorthand delimiters, i'm just looking for a simple way totranslate a block of xml to a filesytem like format. this is importantfor the location of round-robin databases (or other database names forthat matter). it would be trivial to take

the tree library is completely hash table based. there are no linkedlists. i wrote it that way intensionally to make it fast forinsert/reads/lookups etc. the only problem with using hash tables is thatyou have to have a unique "key" at each level of the tree path.


Excellent.  That means you can store all kinds of data at the lowest level.

Such as a RRD path, which you compute only once, if the value doesn't existalready in the hash for that metric. You can even make this an inheritablevalue - when you create a new branch of the tree, you take the higherlevel's metric_path value and slap another delimiter and the current branchname on it. Once you get down to a leaf, you add "metric_name.rrd" andthat's it.


By adding the attribute everywhere, you cover summary data as well.

Of course, that does significantly increase the number of table lookups.Unless you break off the attribute table into a separate hash with namevalues keyed to the value pointers of each level of the hash. You stilluse more memory but probably not more than 100k for a very large cluster, Ishould think.

say i wanted to do a tree_foreach on the tree it could be something likethis...
<processes ...>
  <1034 ...>
     <cpu ...>
       <user .../>
     </cpu>
  </1034>
</processes>

Either it's too early in the morning or I need to go find a monolith tohelp me understand this (or a babelfish?). Maybe you could add theattributes you needed to a hash table like the above.

for example. :) but we'd need to delimit the DNS portion somehow becausesay...
ORG:RocksClusters:Meta:San Diego Grid:SDSC Rocks Grid:Meteor::computer-0-2
doesn't work... only part of the route (Meta.RocksClusters.ORG) points tothe machine with the data.


Delimit the DNS name with dots, maybe?

ps.  have you guys seen the latest pirate movie?


pps.  it's rated "Aaaarrrrrr"


Pirate movies.  Don't talk to me about pirate movies.

Re: [Ganglia-developers] g3 (really long)

Reply via email to