-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

guys-

here's another update on g3 with a new demo (if you're interested).

http://matt-massie.com/g3/
  ganglia-3.0.0.tar.gz (latest snapshot of the source)
  example.html (a markup of the summary test xml output)

today i wrote the on-the-fly summary portion and the g2 compatibility 
portion of the tree code.

now, the tree code allows you to import g2 xml streams (although the data
will only be exported in g3 format).  building in that compatibility was
relatively easy and i needed it to truly test the summary code (i didn't
feel like hand-typing a huge xml file :).  i tested this code with
valgrind (no leaks) and it compiles and runs like a charm on linux and 
cygwin.

the library was able to parse about 60ish metrics on 172ish machines in
4ish clusters with 3ish depth in less than 1ish seconds.  that's
acceptable and i know of many ways to make it faster (like using a stack
for allocing/freeing summary tree nodes instead of malloc/free)... but
we'll put that in 3.1.0.. right now i want to focus on getting a stable
3.0.0 release.

as things stand now, g3 has three library dependencies.  i want to keep 
the number as low as possible.. they are ...

expat, zlib and gnu mp.  

all very portable.  all well-written and all necessary.  the gnu mp
library is an arbitrary precision math library.  since g3 will be used to
summarize a huge amount of data.. i needed a math library to handle
overflows.  g3 will have three basic data types (string, number and
float).  the number and float values can be as large as your memory will
allow. (no more uint8, uint16, int32 etc).  i want to keep it simple.

if you take a look at the end of 
http://matt-massie.com/g3/example.html

you'll see how g3 summarizes the data on the fly while maintaining
hierarchical metric space.  it just puts mu/metric tags directly under ou
tags with any hosts between.

oh yeh, before i forget.. in the latest code completely ignore attribute
order per federico's comments.  the attribute order doesn't matter.. i
have a perfect hash with index names that is used instead.  people will
likely slice and dice ganglia xml and i don't want to assume they will
preserve any attribute order.

currently .. string summaries are not handled correctly.. i was more 
interested in getting the numbers and floats right first.  you'll see 
there is a "samples" attribute on the summary metrics (if samples is not 
specified it is assumed to be 1).  the samples is the number of data 
points used to get the value.. if we wanted an average we just take 
value/samples (just like the old sum num attributes.. but i think value 
samples make more sense.. although.. it doesn't rhyme).

- -matt







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE+1lrJVmIXr0CKtmERAgKhAJ4raepHFl2RK8dtRJRWwL+emmylyQCfTK7w
w3ErxYmWmIERIvd4rPTyokM=
=1dJw
-----END PGP SIGNATURE-----


Reply via email to