David Krider wrote: > I've gotten a second Nagios server setup to work as a failover for a > primary server. I think I've been thorough. The secondary server is > successfully receiving and processing both passive host and passive > service checks. Notifications and both kinds of active checks are turned > off. When I stop the Nagios process on the primary server, the secondary > fails the freshness check of the primary instance check. However, > nothing happens after this, and it seems that I have 2 problems. > > 1) Even though I have max_check_attempts set to 1 on the master server's > "check_nagios" check, it just continues to force active checks when the > freshness times out. I expect it to fail hard, and stop checking the > freshness. Maybe I'm wrong, though. Maybe the expected behavior here is > to get the active host check going, and then the freshness will stop > complaining. > > 2) All of my scripts seem to be lined up. The event handler fires, and I > see proper things in the nagios.cmd file. > > [1246294798] ENABLE_NOTIFICATIONS > [1246294798] START_EXECUTING_SVC_CHECKS > [1246294820] START_EXECUTING_HOST_CHECKS > > I know the command file is being processed because I can get the > secondary server to force checks from the cgi's. However, none of these > things commands ever work, whether I force them from the command line, > or from the cgi's. What could be keeping these from taking effect? I've > been all over this thing for a couple days now, and I think my eyes are > starting to glaze over. > > The only thing I can think of would be to enable all of these things in > the master config file, but then immediately force them "off" when I > start up the process. Then maybe it will work to turn them back on > later? That can't be right... > > Desperately,
Use merlin. It was designed for setting up redundant/loadbalanced systems and will transfer your check-results between your two nagios instances seemlessly. Check takeover happens automagically too, since both servers will try to schedule the check and whichever instance happens to execute it first will keep on executing the check until its latency reaches 15 seconds, where the server that didn't execute the check originally will automatically take it over because it'll be in its scheduling queue at the right moment and time. You can find merlin at http://git.op5.org/git/nagios/merlin.git. The project page is at http://www.op5.org/community/projects/merlin HTH -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null