Ahoy again. Since discussion on the last requests for comments and patches has splintered off and gotten somewhere, it's time for the next mail in the series of what us awesome gods of the Nagios core decided to work on for the next grand version of Nagios.
This idea comes from Shinken, mod_gearman and DNX which have all implemented versions of it, so creds and kudos to the authors of those projects. Currently, Nagios eats quite a lot of I/O when writing, scanning for and reading the check result files. This becomes especially noticeable in large installations. There's also the problem of Nagios using a lot more copied memory per fork than it's supposed to, and the fact that embedding scripting languages inside the Nagios core to speed up execution is a potentially disastrous action (as the debacle with embedded Perl has proven to be). The idea to solve all of that is to fork() off a set of worker threads at startup that free()'s all possible memory and re-connects to the master process via a unix domain socket (or network socket that by default only listens to the localhost address) to receive requests to run commands and return the results of those commands. This has several benefits, although they're not immediately user visible. * I/O load will decrease significantly, leaving more disk throughput capacity for performance data graphing or status data database solutions. * Scripting languages can be embedded regardless of memory leaks and whatnot, since worker daemons can be killed off and respawned every 50000 checks (or something), thus causing the kernel to clean up any and all leaked memory. * Nagios core can be single-threaded, which means higher portability, less memory usage and more robust code. * Eventbroker modules that use a socket to communicate with an external daemon can instead register a handler for inbound packets and then simply "own" that connection and get all future packets from it forwarded as eventbroker events. This will ofcourse reduce the module complexity quite a bit for nearly all much-used modules today (Merlin, livestatus, DNX, mod_gearman, NDOUtils, etc...) * It becomes possible to receive responses from Nagios when submitting commands (the current FIFO pipe is one-way communication only). Drawbacks: * It's quite a large and invasive change to the nagios core which will require a lot of testing. I know some people I met in Italy have already volunteered to help implementing and testing this (Hi Cheik), but it would definitely be helpful to get feedback from module authors and users when making this change to Nagios. Please note that a compatibility daemon which continues to parse the simple FIFO will ofcourse have to be implemented so that current scripts and whatnot keep on working, and the API to scan for and read check result files will also remain for the foreseeable future, although possibly implemented as an external helper program which can ship check results into the Nagios socket instead. Comments, patches and (before summer's out) testing is very much appreciated. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null