You could also use the flap detection and dependencies in nagios to help with alert floods.
On Tue, Aug 26, 2014 at 8:28 AM, Alan Robertson <[email protected]> wrote: > There's kind of a cool tool for connecting Nagios to PagerDuty called > Flapjack - designed to avoid flooding people with pages when things go > south. > > http://flapjack.io/ > > > > On 08/25/2014 05:02 PM, Lawrence K. Chen, P.Eng. wrote: > >> >> On 08/25/14 13:44, Warner wrote: >> >>> On Mon, Aug 25, 2014 at 06:09:10AM -0700, Nathan Clemons( >>> [email protected]) wrote: >>> >>>> We're looking to set up small teams in nagios and rotate between >>>> primary and secondary contacts, vs having one global on call person. >>>> (Ie, two networking folks, two vmware folks, two Unix folks, etc.) >>>> What kind of solutions have folks tried for this? Pagerduty seems >>>> excessively priced for this kind of task, especially when we're trying >>>> to trim opex costs. When I worked at /. we used sendmail aliases to >>>> control the paging and just ran a script from cron to adjust the list >>>> to the next person in line on Monday morning. >>>> >>> In the past, I've used qmail dot files and shell scripts. Standardized >>> the contacts on e-mail aliases. That can work well. >>> >>> Since then, I've become a big fan of Pager Duty. Not having to maintain >>> a separate schedule, having a central point for notifications, and >>> additional bells and whistles such as notification when going on call >>> are huge wins. >>> >>> Both approaches work well. Pager Duty does have value though, I wouldn't >>> write it off. >>> >>> >>> Warner >>> >>> I don't know much about pagerduty, except one group on campus that >> shares our >> Nagios server is using it. >> >> So, there's perl script to tie into nagios hasn't left a good impression >> on me. >> >> A couple week after I had set it up, I noticed it had spawned 1000s of >> copies >> of itself and our server was close to death....clearing, it would just >> start >> building up again. I thought about making a promise to deal with it in >> the >> short term, though I could recall if CFEngine 2.2 had the capability or >> what >> its syntax might be. Saw there were some notificaitons queued, and that >> they >> were all hanging on that....seems the first get's stuck on it, and the >> rest >> get stuck on the first process still being there. >> >> In trying to see what it was doing...found its trying to post to some >> https >> URL through LWP. Except it still seems that after more than 10+ >> years...LWP >> https through a proxy is still busted, so don't know why this script would >> expect to work.... >> >> And, a proxy is needed because the server is in private IP space >> (eventually >> our entire datacenter will be....though sounds like it'll all be behind >> our >> F5, but its been WIP for almost a couple of years now.) >> >> In the meantime its largely neglected/forgotten squid proxy server that I >> threw up back in 2007 to replace the one that everybody depended on, but >> nobody claimed ownership for when the last of some UltraServer 2's were >> decommissioned. Its running in a Solaris Zone, which has been moved and >> undergone upgrade on attaches a few times.... >> >> After a couple of days, I opted for an earlier suggestion I had found >> online. >> I used a Perl module of LWP-Proxy-Connect (still waiting to see if >> it'll get >> accepted into FreeBSD Ports) to make the script work. Just a one line >> change >> to the pagerduty script, IIRC, and it started working again.... >> >> That was until I let CFEngine loose again, and it reverted it :) >> >> While I was working on it, the group using it finally logged a couple of >> tickets...one about unable to fork errors, and that they had stopped >> getting >> notifications, where they thought there should've been some on the >> weekend. >> (they were the ones killing my Nagios server.) >> >> Later, they added that it had worked up to the Friday before.... >> >> Finally they admitted that they had changed it that Friday from using >> email to >> posting to web for notifications. (had I known, I might have just >> suggested >> they switch it back :) >> >> Hadn't really thought about our notifications from this Nagios server now >> being dependent on our smtp server....our old server had been in the >> datacenter range that is completely open to the world....so it did its own >> mail delivery (especially important when it used to largely inform us of >> problems with campus email...) Though its getting hard for me to handle >> notifications timely/safely.... >> >> > _______________________________________________ > Discuss mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ >
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
