I just commited changes to the website and new disk metrics. The website
changes primarily take advantage of the new metric attributes that Matt
added recently.
The disk metrics will report local disk sizes. Although gmetric scripts
do exist for this purpose, the new metrics are coded entirely in C, and
should be much more efficient. There is also a new metric
"part_max_full", that reports the local disk partition on the host that
is most full. Therefore, if an admin sees a value of 97% for a node,
they know further attention is needed, although not which precise
partition is the culprit. Unfortunately, these new metrics are only
available on Linux for now.
The LOCATION attribute in the HOST tag specifies where the host is
physically located in the cluster - Rack, Rank, and Plane. These are
specified in a 3D euclidean coordinate "x,y,z". This attribute enables
technicians (or more probably, you and I ;) to locate a malfunctioning
node in a large cluster quickly.
Other changes included in this patch are outlined in the cvs log below.
Disk Metrics for Linux Patch.
My contribution before the 2.5.0 test period. Added three disk metrics
for linux OSs: disk_total, disk_free, and part_max_used. The first two
should be self explanatory, the last one means "of all local disk
partitions on this node, the fullest one is x% used." Matt suggested
this as an early warning system for sysadmins.
Other new features in this patch include true support of the LOCATION
attribute in the HOST tag, the ability to specify trusted hosts with
DNS names as well as IP addresses, and the get_first_interface() method
to find a network interface to use. During testing, I found some false
negatives: interfaces that said they were not "multicast enabled", but
worked fine with ganglia. Now, if we cannot find a multicast interface,
we fall back to any UP network interface other than loopback.
Notes on disk metrics for other OSs. I have used a GNU lib file,
fsusage.c, to actually do the statfs() call to find the free/used disk
blocks, which should be relatively portable given the right #defines.
The only thing that is truly linux-only is getting the list of
currently mounted devices. The file mountlist.c from the GNU fileutils
package (which includes the df command), may help shed some light on
this query for various operating systems.
Federico
Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92 92BB BA86 B2E6 0390 8845