On Thursday, August 22, 2013 07:26:33 AM [email protected] wrote: > \¥/ * When one of the two www servers is unreachable > \¥/ * When www-0 (the one hosting mailman) soaks up all its CPU > \¥/ * When postfix goes down > \¥/ * When space.synhak.org isn't reachable from the 'net > \¥/ * When the webcam isn't reachable from the 'net > \¥/ * When our monthly AWS expenditures goes above $15 > \¥/ > \¥/ Its not nagios, but its a start. I (and whoever else wants to opt in) > get \¥/ a > \¥/ text message when one of those alarms is triggered. > > Here's a funny thought, Send the alarms to Phong and have the bot fix em :) > > My friend had some scripts/batch that would check to see if a service is > running and if not, it would restart that service. Would something like > that help reduce alarms like these?
Thats systemd. Its just a matter of configuring it. However, I don't want it to restart automatically. When it dies, it is because the system runs out of RAM. If the system is starving for RAM to the point where it kills programs to free up some RAM, immediately restarting that program will only worsen the problem. > > Where you able to try Nagios? I saw a talk about it at ALUG it sounded > really nice. Eventually we'll get to Nagios. I'm not really sure how we can efficiently fit it into our infrastructure yet, since I imagine we'd actually want a third server on our AWS cluster to support mail, mailman, running phong's various scripts, and other random things that need a server with a reliable connection. I understand that Chris has a machine out there somewhere on the 'net that we can use. > > _______________________________________________ > Discuss mailing list > [email protected] > http://synhak.org/mailman/listinfo/discuss _______________________________________________ Discuss mailing list [email protected] http://synhak.org/mailman/listinfo/discuss
