Hi, We recently upgraded some of our Linux clusters to a RHEL5 compatible OS and started having problems with gmond crashing with messages like:
Counting device /dev/root (15.53 %) Counting device /dev/md0 (0.19 %) Counting device /dev/sda5 (6.98 %) Counting device /dev/sda1 (11.49 %) For all disks: 407231364905519606697529389829805204733315484899369542541346387533419262787299224947497188692801554958860593825591620554547888730849604995610856265336351901876345759954388162545720331481647613311710044245767970702755986166046278757927757021184.000 GB total, 0.000 GB free for users. *** stack smashing detected ***: gmond terminated After some debugging, it turns out that the NFS implementation in RHEL5 can fill up /proc/mounts with entries for files accessed from mounted filesystems, where the fileserver returned a different fsid or filehandle for an accessed file, see this thread: http://linux-nfs.org/pipermail/nfsv4/2005-March/001363.html These are short lived entries in /proc/mounts, but the device and mount points can be quite long as they often contain the whole file path. In gmond, the monitor-core/libmetrics/linux/metrics.c:find_disk_space() function, was not only using small character arrays, but the arrays for the sscanf after the fgets were smaller than the array for the line it just read in, which can lead to buffer overflows and the "stack smashing" problem that we were having. To fix out problem and prevent the overflows, I made a patch to increase the size of the arrays and also make each of the arrays used in the sscanf the same size as the line buffer used in fgets, so there is no chance of another overflow. ~Jason -- /------------------------------------------------------------------\ | Jason A. Smith Email: smit...@bnl.gov | | Atlas Computing Facility, Bldg. 510M Phone: +1-631-344-4226 | | Brookhaven National Lab, P.O. Box 5000 Fax: +1-631-344-7616 | | Upton, NY 11973-5000, U.S.A. | \------------------------------------------------------------------/
Index: monitor-core/libmetrics/linux/metrics.c =================================================================== --- monitor-core/libmetrics/linux/metrics.c (revision 2006) +++ monitor-core/libmetrics/linux/metrics.c (working copy) @@ -1202,8 +1202,8 @@ float find_disk_space(double *total_size, double *total_free) { FILE *mounts; - char procline[256]; - char mount[128], device[128], type[32], mode[128]; + char procline[1024]; + char device[1024], mount[1024], type[1024], mode[1024]; /* We report in GB = 1e9 bytes. */ double reported_units = 1e9; /* Track the most full disk partition, report with a percentage. */
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers