i just uploaded a new snapshot of ganglia 2.6.0 to
http://matt-massie.com/ganglia/ganglia-2.6.0.200501191706.tar.gz

the only things left to do is
   - process gmetric messages
   - cleanup old hosts and metrics

i compiled and installed this snapshot on linux and windows (and each 
host exchanged information via unicast udp).

i ran a test of this snapshot over night last night by starting gmond 
with a full metric configuration and made ~13 million requests for data 
on the xml port.  there were no xml errors.  no memory leaks (actually 
2.6.0 will use considerably less memory than 2.5.x) and gmond only used 
about 0.07% cpu to handle the requests.

i will update the documentation for gmond.conf soon (i didn't have time 
today).  if you want to see the default gmond configuration for your 
particular platform ... just run...

% ./gmond -t

with 2.6.0. every aspect of data collection and message sending is 
tweakable.

if you want to see a list of all the metric supported by gmond run...

% ./gmond -m
load_one
mem_total
os_release
proc_run
load_five
gexec
...
cpu_num
cpu_speed
pkts_out
swap_free

i also added a new feature that was very simple to add but i think you 
might find useful.  to see the total minimum bandwidth that a specific 
configuration will generate run

% ./gmond --conf ./test.conf -b
7.545789 bytes/sec

it would be pretty easy to make this more elaborate in the future (such 
as building an algorithm for handling the value_thresholds as well). 
currently, the value is just a summation of all the metric message sizes 
  divided by the time_threshold (i assume any string metrics are maximum 
size).

a future feature that would be nice is to have a fixed bandwidth 
restriction.

i also added a patch for linux that was submitted to 
bugzilla.ganglia.info by Marcelo Matus that "gmond needs a small 
modification to treat GFS as another network file system, ie, shuch as 
NFS, SAMBA, etc.:"

-------

about the code for the new gmond.  i want to explain how things have 
been simplified (i hope).

at line 1162 of gmond.c you'll see the setup_metric_callbacks() 
function.  i've added the registration of all metric here.  it's 
important to note that registering the metric here doesn't mean it is 
collected but rather that it can be collected if the user asks for it in 
the configuration file.

if you look inside ./gmond/conf.c you'll find the function 
build_default_gmond_configuration() which build the default 
configuration for gmond based on the platform.  this is the 
configuration that will be used if no configuration file is specified 
(with the --conf flag).  only the metric supported on a platform as 
added to the default configuration.

last but not least, if you look in ./lib/protocol.x you'll see the 
function for sending and receiving the metrics.  if you register a 
metric for collection in setup_metric_callbacks() which isn't in 
protocol.x you'll get an error message that "gmond doesn't know how to 
send metric 'foo'".  the ./lib/protocol.x is NOT platform specific. 
even if a metric is not implemented on a specific platform, it can still 
be stored and reported... SOOOO... a linux box that received a solaris 
"wcache" metric will not be confused at all by the message.  mixing 
hosts from different platforms will not cause us any more problems. 
mixing 2.5.x and 2.6.x is okay on all platforms but solaris (which will 
be 80% right).

-matt
-- 
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3'

    They that can give up essential liberty to obtain a little
       temporary safety deserve neither liberty nor safety.
   --Benjamin Franklin, Historical Review of Pennsylvania, 1759

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to