-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Oct 8, 2008, at 10:06 AM, Jayson Broughton wrote: > Guillaume, > > We actually have the same problem here where Nagios is setup. I use > the > NRPE daemon on both windows and linux servers. Two of our servers do > backups early in the morning and during that time we get NRPE Timeout > messages from the two servers. I have set the timeout on the server > and the > clients to timeout after 30 seconds, thinking that would fix it. > But alas, > we still get timeout messages. I haven't had much time to see what > I can do > to fix this, so for now our solution is to not email out warning > messages > from those two servers (I have the thresholds set enough where the > critical > messages still gives enough window to take care of the problem > before going > too critical) I have come to the conclusion that the servers are > running > the backup and eating up so much processing that the nrpe times out > trying > to connect and send information. > > If you find a solution or even an idea to try, feel free to let me > know and > I'll give it a shot! NRPE is normally setup with Xinetd listening for incoming connections. When I say normally I mean if you followed the documentation. ;) By default Xinetd has a low threshold of connections per instance in order to lessen the load on the server and prevent DDoS type attacks. You can view what I mentioned in my previous post here: http://article.gmane.org/gmane.network.nagios.user/56713 NRPE can be fine until Nagios decides to run many checks to the same host at the same time that hits the threshold.. For example when you go to the extended host information and "Schedule a check of all services on this host". If you are not running it under Xinetd it may be other issues. Including system load and/or running into possible bugs with NPRE on your system. > >> -----Original Message----- >> From: Guillaume Rousse [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, October 08, 2008 4:44 AM >> To: nagios-users@lists.sourceforge.net >> Subject: [Nagios-users] NRPE way too fragile ? >> >> Hello list. >> >> I'm using nrpe quite heavily for testing lots of local service on >> all my >> machines. It work usually well, but seems a bit unreliable: too much >> often, nrpe itself fails to accept incoming connections, and test >> fails: >> CHECK_NRPE: Socket timeout after 10 seconds. >> >> stracing nrpe process shows it is probably waiting itself on another >> connection: >> [EMAIL PROTECTED] ~]# strace -p 22444 >> Process 22444 attached - interrupt to quit >> select(6, [5], NULL, [5], {0, 170000}) = 0 (Timeout) >> accept(5, 0, NULL) = -1 EAGAIN (Resource >> temporarily unavailable) >> >> It usually recovers itself alone, but that's enough to cause much >> unwanted notifications, even if all monitored services have nrpe >> itself >> as dependency. I'm using ssl encryption, as usually advised, but I'm >> planning shifting to plain-text connection (everything occurs on a >> distinc VLAN, without user access). >> >> Does everyone else has similar experience ? You seem to be running it as a daemon process itself. What system and NRPE version are you running? What version of Nagios? How many NRPE checks are you trying to perform in a given time? You may see some benefit to enabling Nagios to spread its checks out more evenly, though this is simply covering the underlaying problem. Maybe consider running it with the xinetd daemon and increase the number of allowed connections a second as a test. Mark Young ___ Nagios Enterprises, LLC Web: www.nagios.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAkjs3WoACgkQ0KipU7WwlaWtXgCgs+77/o8Pyh0t/++FIbOEycgx oiAAoMj2awwRG4HCernz7pcdf/K484Ca =oKG0 -----END PGP SIGNATURE----- ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null