I appreciate it! Are there any slides available for the talk?
-- Thanks! John Malloy johnmalloy AT comcast DOT net -------------- Original message ---------------------- From: "Daniel Clark" <[EMAIL PROTECTED]> > Sort of garbled, but thought they might be of use to someone > > Are the slides available anywhere? > > = John Rouillard - Nagios and SEC = > > == Nagios == > > Nagios, unlike some other FLOSS software, has Correlation - parents and others > * Limited cause/effect detection > > Don't use host_name in "define service" stanza -- use hostgroup_name instead! > * Has test on each host where it looks up it's own name to make sure > dns is working on that host > * Flap detection is problematical - he leaves it turned off > * Nagios can put performance data "somewhere" - DB, RRD etc. > * is_volitile useful in special cases > * read the manual - twice > > Correlation - find the fingerprint - only be notified of things that matter > > Nagios 3 will support defining own variables - write up on hack to do > this now how to monitor SSL is on nagios-users list (find post) > check_ldaps > * I think this post is: > http://article.gmane.org/gmane.network.nagios.user/40093/match=ldaps > > Servicegroup - bundle of group of services that provide a > customer-visable server (e.g. db2, websphere app server, apache) > > Serviceextifo/Hostextinfo going away in Nagios 3 -- info shifts to > becoming attributes of service and host objects > > Nagios 3 in alpha now. > > * Nagios really a service monitoring program, not a host monitoring system > > Many other monitoring projects are missing correlation. > > Nagios 2 - host checks are done in series (In Nagios 3, they will be in > paralel) > > Correlation includes (slide) Topology, Thresholds, Service, Cluster > (meta) plugin, Flap detection (doesn't quite work, but SEC replaces > it) > > Tricks: > * Links to TWiki for a knowlege base for services, hosts, addl commands > * Can change html pages - he has "Unack Svc Probs" - on call person > lives in this screen > * Downtime scheduling > > * He uses cacti and rt integrated with twiki - interesting feature - > find last ticket in RT that mentions system > * connect via (ajaxterm?) > * look at nagios definitiaons > * (cacti not from nagios - he doesn't like nagios for rrd suff - he > uses drraw instead) > * Also have wiki pages for services > * Nagios just has link - no dual-way automation, but don't really > need it in this case - wiki-side template for hosts and services do > exist however > > == SEC == > * Is very passive > * often times you may need to hook rule types together -- in groups > * only useable in real time at the moment > * can do everything that nagios does except topoplgy > > Plugin talks to device, sec determines severity level, gives data back > to nagios (nagios not time aware, sec is) > > * He has created patch to Nagios that allows te active events to be > passed through to sec - patch is in beta this month, still 2 open > slots for more beta testers - beta period will last at least 2 months. > > When used with nagios his patch adds: > * counting ok states before reamrming > * differeent triggers or polling interval on analysis of error not > just non-ok severity > * changing trouble thresholds per time period/activity > > * SEC also monitors nagios log file - often this file will show > nagios configuration errors > > Contexts > * See ssh example in 2004 lisa paper (http://www.cs.umb.edu/~rouilj/sec/) > > Nagios is good at "what is hapening now"; sec is good at figuring out > "how I got to now" > * His patch will be released under GPL > > * Personal Website: www.cs.umb.edu/~rouilj > > * easy: passive service event -> nagios > * trick here is getting active stream from Nagios > > OpenNMS (in 2004) - didn't have good correlation compared to nagios, > and certainly not comperable to SEC > * Does it have correlation now? > * It used to have thresholding issues as well, and may still > > ZenOSS: > * He couldn't see correlation aspects that he really needed. > > Temperature censors - lmsensors and smartcontrol can be used instead > of stand-alone devices in some cases > > Some tricks: > * Rack as host - if 3 boxes in rack have high temp, rack is overheated > * Room as host - "room is on fire' alert if 3 racks have high temp > * But really needed "room is underwater" alert :-) > > * Q: lots of host - does he manual edit? A: Yes, but working towards > defining every host once in config (his config mgmt app, akin to > cfengine/puppet/bcfg2/lcfg) > * automation issue: Think of a host group as a set, nagios only has > set subtraction - makes automation very difficult > * could just not use hostgroups, but then that makes the nagios web GUI > suck > * hostgroups for admin data > > * Groundworks stuff may be pretty good for automating config for lots > if machines - http://www.groundworkopensource.com/products/os-overview.html > * Nagios 3 isn't going to push config into DB - Nagios 4 might. > > * Oreon graphical interface for nagios - out of france - might be > nice - http://www.oreon-project.org/ > > -- > Daniel Clark # http://dclark.us # http://opensysadmin.com > > _______________________________________________ > bblisa mailing list > [email protected] > http://www.bblisa.org/mailman/listinfo/bblisa _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
