Hi,

We recently upgraded some of our Linux clusters to a RHEL5 compatible OS
and started having problems with gmond crashing with messages like:

Counting device /dev/root (15.53 %)
Counting device /dev/md0 (0.19 %)
Counting device /dev/sda5 (6.98 %)
Counting device /dev/sda1 (11.49 %)
For all disks:
407231364905519606697529389829805204733315484899369542541346387533419262787299224947497188692801554958860593825591620554547888730849604995610856265336351901876345759954388162545720331481647613311710044245767970702755986166046278757927757021184.000
 GB total, 0.000 GB free for users.
*** stack smashing detected ***: gmond terminated

After some debugging, it turns out that the NFS implementation in RHEL5
can fill up /proc/mounts with entries for files accessed from mounted
filesystems, where the fileserver returned a different fsid or
filehandle for an accessed file, see this thread:

http://linux-nfs.org/pipermail/nfsv4/2005-March/001363.html

These are short lived entries in /proc/mounts, but the device and mount
points can be quite long as they often contain the whole file path.

In gmond, the monitor-core/libmetrics/linux/metrics.c:find_disk_space()
function, was not only using small character arrays, but the arrays for
the sscanf after the fgets were smaller than the array for the line it
just read in, which can lead to buffer overflows and the "stack
smashing" problem that we were having.

To fix out problem and prevent the overflows, I made a patch to increase
the size of the arrays and also make each of the arrays used in the
sscanf the same size as the line buffer used in fgets, so there is no
chance of another overflow.

~Jason


-- 
/------------------------------------------------------------------\
|  Jason A. Smith                          Email:  smit...@bnl.gov |
|  Atlas Computing Facility, Bldg. 510M    Phone: +1-631-344-4226  |
|  Brookhaven National Lab, P.O. Box 5000  Fax:   +1-631-344-7616  |
|  Upton, NY 11973-5000,  U.S.A.                                   |
\------------------------------------------------------------------/

Index: monitor-core/libmetrics/linux/metrics.c
===================================================================
--- monitor-core/libmetrics/linux/metrics.c	(revision 2006)
+++ monitor-core/libmetrics/linux/metrics.c	(working copy)
@@ -1202,8 +1202,8 @@
 float find_disk_space(double *total_size, double *total_free)
 {
    FILE *mounts;
-   char procline[256];
-   char mount[128], device[128], type[32], mode[128];
+   char procline[1024];
+   char device[1024], mount[1024], type[1024], mode[1024];
    /* We report in GB = 1e9 bytes. */
    double reported_units = 1e9;
    /* Track the most full disk partition, report with a percentage. */

Attachment: smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to