Re: [Ganglia-developers] Erroneous sys_clock and disk space values (Ganglia 2.5.3, Red Hat 7.2)

Federico Sacerdoti Tue, 15 Jul 2003 11:29:23 -0700

Sounds like a good idea to me. I'll make the change and put it in CVSby tonight. We'll test it and adjust as necessary.


-fds


On Monday, July 14, 2003, at 11:16 PM, Albert Strasheim wrote:

Hello,

Seems these systems are exhibiting the same problem as Jason described.

[EMAIL PROTECTED] albert]$ cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
/dev/hda1 /boot ext3 rw 0 0
none /dev/pts devpts rw 0 0
/dev/hdb1 /home ext3 rw 0 0
none /dev/shm tmpfs rw 0 0

Counting / twice would give the ~110 GB I'm seeing in the reports. Howaboutonly counting entries in /proc/mounts whose first entry starts with"/dev/"?


Here is the df -a output again:

[EMAIL PROTECTED] albert]$ df -a
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda3             36298348   5068852  29385636  15% /
none                         0         0         0   -  /proc
/dev/hda1               101089      9067     86803  10% /boot
none                         0         0         0   -  /dev/pts
/dev/hdb1             38464340    145436  36365000   1% /home
none                    514748         0    514748   0% /dev/shm

From strace I see that df does things with both /proc/mounts and/etc/mtab.

Parsing all the entries starting with "/dev/" in /etc/mtab wouldprobably

also yield the right results on these machines.

[EMAIL PROTECTED] albert]$ cat /etc/mtab
/dev/hda3 / ext3 rw 0 0
none /proc proc rw 0 0
/dev/hda1 /boot ext3 rw 0 0
none /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/hdb1 /home ext3 rw 0 0
none /dev/shm tmpfs rw 0 0

If you have a favourite solution, I could prepare a patch. :)

Cheers,

Albert

----- Original Message -----
From: "Federico Sacerdoti" <[EMAIL PROTECTED]>
To: "Albert Strasheim" <[EMAIL PROTECTED]>
Cc: <[email protected]>
Sent: Monday, July 14, 2003 10:08 PM

Subject: Re: [Ganglia-developers] Erroneous sys_clock and disk spacevalues

(Ganglia 2.5.3, Red Hat 7.2)


On Sunday, July 13, 2003, at 11:09 AM, Albert Strasheim wrote:

Secondly, Ganglia seems to think that there is 113.830 GB of disk
installed
in the machines, instead of ~80GB. df -a output is as follows.

[EMAIL PROTECTED] albert]$ df -a
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda3             36298348   4655088  29799400  14% /
none                         0         0         0   -  /proc
/dev/hda1               101089      9067     86803  10% /boot
none                         0         0         0   -  /dev/pts
/dev/hdb1             38464340     70012  36440424   1% /home
none                    514748         0    514748   0% /dev/shm


This is a known problem. I wrote the disk_total metric and I'm afraid
it is not accurate in all situations, as you have found.

The "disk_total" logic is as follows.
1. Read and parse /proc/mounts
2. Run statvfs() to find total blocks per filesystem. Sum them
together, multiply by block size.
Skip device names we have seen before (necessary so as not to count
automounted devices more than once - especially user's home
directories).

3. That's it. There is no other smarts about "special" filesystemssuch

as root and shmem.

I will work on making the disk metrics more accurate, however more
information from users would help very much. Does anyone else see
erroneous disk_total figures?

-fds

Jason Smith first found this problem. Here is his March 3, 2003 post:
--------
...
gmond's value is a lot

closer to what I add up with df, but still a little bit off and I amnot

sure why.  I noticed that the device_space function skips filesystems

that it has seen before, but it seems to go by the device name, notthe

mount point.  On my desktop (RedHat 2.4.18-24.7.x kernel), I seem to
have the root filesystem listed twice with different device names:

rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0

Is this being counted twice?

~Jason

PS. Should devices with name equal to none also be skipped, like the
Linux kernel's shared memory fs?  From /proc/mounts I have:

none /dev/shm tmpfs rw 0 0

And this filesystem does report some space that might be getting added
up with all the rest:

none                    256816         0    256816   0% /dev/shm
----

Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA

Re: [Ganglia-developers] Erroneous sys_clock and disk space values (Ganglia 2.5.3, Red Hat 7.2)

Reply via email to