Hello, I'm playing a game of cat and mouse with process accounting and disk space. I built some boxes with 9GB /var partitions, rolled them into production, and after about 4 days of full load, /var filled up.
Looking at the size of /var/account/acct{,.0}, and figuring I'd be seeing a 200% load increase in about a month, I created a new label from the large chunk of free space I saved for situations like this. 40GB mounted to /var/account: usage was down to 20%, and I thought the crisis was averted. About a week and a half later, I get a disk full e-mail from nagios and > +pid 94696 (gzip), uid 0 inumber 6 on /var/account: filesystem full in my dailies again. My /var/account/acct file was 17GB in size. Add one rotation before compression and I completely lose that feeling of cleverness I had when I gave accounting a dedicated 40GB partition. If you're wondering how I can possibly have this much accounting data, two `vmstat -f' invocations 100 seconds apart show 32282 forks (an average of 323 per second). These boxes are running squid with a redirect script to implement a captive portal. There are generally several hundred unauthenticated users; all of their http traffic, from firefox to the little weather widgets and spyware phoning home, gets proxied through squid and subsquently a redirect script that, among other things, does some text munging on the URL, and queries various ipfw tables to determine what "context" the user is in. Some of this could be optimized to launch fewer processes, but the code would be less maintainable. I only really see two options, neither of which I particularly like: * Throw more disk at the problem (but given what I've seen, I don't like the odds that within a month or two, I'll realize I didn't give it enough). * Turn off accounting on these boxes. Are these really my only options? Is there any kind of tuning I can be doing? -- Chris Cowart Network Technical Lead Network & Infrastructure Services, RSSP-IT UC Berkeley
pgptwA2Kb9F2S.pgp
Description: PGP signature