i just uploaded a new snapshot of ganglia 2.6.0 to http://matt-massie.com/ganglia/ganglia-2.6.0.200501191706.tar.gz
the only things left to do is
- process gmetric messages
- cleanup old hosts and metrics
i compiled and installed this snapshot on linux and windows (and each
host exchanged information via unicast udp).
i ran a test of this snapshot over night last night by starting gmond
with a full metric configuration and made ~13 million requests for data
on the xml port. there were no xml errors. no memory leaks (actually
2.6.0 will use considerably less memory than 2.5.x) and gmond only used
about 0.07% cpu to handle the requests.
i will update the documentation for gmond.conf soon (i didn't have time
today). if you want to see the default gmond configuration for your
particular platform ... just run...
% ./gmond -t
with 2.6.0. every aspect of data collection and message sending is
tweakable.
if you want to see a list of all the metric supported by gmond run...
% ./gmond -m
load_one
mem_total
os_release
proc_run
load_five
gexec
...
cpu_num
cpu_speed
pkts_out
swap_free
i also added a new feature that was very simple to add but i think you
might find useful. to see the total minimum bandwidth that a specific
configuration will generate run
% ./gmond --conf ./test.conf -b
7.545789 bytes/sec
it would be pretty easy to make this more elaborate in the future (such
as building an algorithm for handling the value_thresholds as well).
currently, the value is just a summation of all the metric message sizes
divided by the time_threshold (i assume any string metrics are maximum
size).
a future feature that would be nice is to have a fixed bandwidth
restriction.
i also added a patch for linux that was submitted to
bugzilla.ganglia.info by Marcelo Matus that "gmond needs a small
modification to treat GFS as another network file system, ie, shuch as
NFS, SAMBA, etc.:"
-------
about the code for the new gmond. i want to explain how things have
been simplified (i hope).
at line 1162 of gmond.c you'll see the setup_metric_callbacks()
function. i've added the registration of all metric here. it's
important to note that registering the metric here doesn't mean it is
collected but rather that it can be collected if the user asks for it in
the configuration file.
if you look inside ./gmond/conf.c you'll find the function
build_default_gmond_configuration() which build the default
configuration for gmond based on the platform. this is the
configuration that will be used if no configuration file is specified
(with the --conf flag). only the metric supported on a platform as
added to the default configuration.
last but not least, if you look in ./lib/protocol.x you'll see the
function for sending and receiving the metrics. if you register a
metric for collection in setup_metric_callbacks() which isn't in
protocol.x you'll get an error message that "gmond doesn't know how to
send metric 'foo'". the ./lib/protocol.x is NOT platform specific.
even if a metric is not implemented on a specific platform, it can still
be stored and reported... SOOOO... a linux box that received a solaris
"wcache" metric will not be confused at all by the message. mixing
hosts from different platforms will not cause us any more problems.
mixing 2.5.x and 2.6.x is okay on all platforms but solaris (which will
be 80% right).
-matt
--
PGP fingerprint 'A7C2 3C2F 8445 AD3C 135E F40B 242A 5984 ACBC 91D3'
They that can give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety.
--Benjamin Franklin, Historical Review of Pennsylvania, 1759
signature.asc
Description: OpenPGP digital signature
