On 01/25/2012 11:44 PM, Mike Lindsey wrote: > There are a lot of options.. DNX, Merlin, mod_gearman to name a few... > I could read the docs (and have read a good portion of some of them) and > could implement test environments (and will eventually need to) but > first I want opinions from people who've done this at large scale. > > I need to improve on our load distribution and failover mechanisms. > Right now worker node outages are handled through freshness checking, > and master node outages are handled through a load balanced vip and some > fancy cron jobs that kick up a cold spare. > > What are the better options for local load distribution and geographic > master failover? Which options will better handle thousands of servers > across a dozen colos, in half a dozen countries, when the goal is that > no single host (or colo!) going offline can be allowed to have an effect > on any other subset of the infrastructure? Which options should I avoid? > > Currently running Nagios Core 3.2.1 with NSCA 2.9 on mostly FreeBSD > systems. Soon that should be Core 3.3, with XI on top, plus whatever > load distribution mechanism wins the dog fight. >
For failover, merlin is the only solution. If a poller at some colo or in some country goes down, the master will try to take the checks over, unless you tell it not to. mod_gearman is probably more efficient at running checks with minimal cpu usage on multiple nodes until the new check engine in vanilla Nagios is completed. After that, the in-core one will be superior to all other options for simply distributing load, although it still won't do failover. To forestall your question "When can I expect to see that new check stuff", the answer is "in 9 weeks time, tops". That's when my deadline for it expires. Mid april, if you hate weekcounts as much as I do. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null