Hi All, I ran into a problem last week that makes me think there's a variable hiding somewhere with an artificial maximum that limits the total number of hosts in a cluster.
Here's what happened:
* brought online a new host
* the host did not get a directory created in /var/lib/ganglia/rrds/my_cluster/
* I noticed that the partition holding /var/lib/ganglia/rrds was 100% full
* I ran 'find /var/lib/ganglia/rrds -mtime +30 -type f -exec rm {} \;', which
got the partition down to 87% full
* the new host directory still didn't appear.
* I kicked gmetad and gmond lots of places, in hopes it would help. Didn't.
* I waited the weekend because it was Friday.
* I ran
find /var/lib/ganglia/rrds -type f -mtime +2 -exec rm {} \;
find /var/lib/gangila/rrds -type d -exec rmdir {} \; (knowing it would fail
on non-empty directories)
* the more aggressive removal of older files dropped the file count under rrds
from ~280,000 to ~200,000
* the empty directory deletion dropped the number from 8999 to 4858
I checked inode usage (df -ih) before deleting directories to make sure
that wasn't it; even before deleting anything the partition still had
~40% inodes free.
I have a tmpfs ramdisk mounted on /var/lib/ganglia/rrds, but I checked
tmpfs's limits for number of subdirectories (by creating 60,000 of them)
to make sure it wasn't some obscure filesystem limit.
Thoughts?
-ben
--
Ben Hartshorne
email: [EMAIL PROTECTED]
http://ben.hartshorne.net
signature.asc
Description: Digital signature
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
