I just commited changes to the website and new disk metrics. The website changes primarily take advantage of the new metric attributes that Matt added recently.

The disk metrics will report local disk sizes. Although gmetric scripts do exist for this purpose, the new metrics are coded entirely in C, and should be much more efficient. There is also a new metric "part_max_full", that reports the local disk partition on the host that is most full. Therefore, if an admin sees a value of 97% for a node, they know further attention is needed, although not which precise partition is the culprit. Unfortunately, these new metrics are only available on Linux for now.

The LOCATION attribute in the HOST tag specifies where the host is physically located in the cluster - Rack, Rank, and Plane. These are specified in a 3D euclidean coordinate "x,y,z". This attribute enables technicians (or more probably, you and I ;) to locate a malfunctioning node in a large cluster quickly.

Other changes included in this patch are outlined in the cvs log below.


Disk Metrics for Linux Patch.

My contribution before the 2.5.0 test period. Added three disk metrics
for linux OSs: disk_total, disk_free, and part_max_used. The first two
should be self explanatory, the last one means "of all local disk
partitions on this node, the fullest one is x% used." Matt suggested this as an early warning system for sysadmins.

Other new features in this patch include true support of the LOCATION
attribute in the HOST tag, the ability to specify trusted hosts with
DNS names as well as IP addresses, and the get_first_interface() method
to find a network interface to use. During testing, I found some false
negatives: interfaces that said they were not "multicast enabled", but
worked fine with ganglia. Now, if we cannot find a multicast interface,
we fall back to any UP network interface other than loopback.

Notes on disk metrics for other OSs. I have used a GNU lib file,
fsusage.c, to actually do the statfs() call to find the free/used disk
blocks, which should be relatively portable given the right #defines.
The only thing that is truly linux-only is getting the list of
currently mounted devices. The file mountlist.c from the GNU fileutils
package (which includes the df command), may help shed some light on
this query for various operating systems.



Federico

Rocks Cluster Group, Camp X-Ray, SDSC, San Diego
GPG Fingerprint: 3C5E 47E7 BDF8 C14E ED92  92BB BA86 B2E6 0390 8845

Reply via email to