Guillaume, We actually have the same problem here where Nagios is setup. I use the NRPE daemon on both windows and linux servers. Two of our servers do backups early in the morning and during that time we get NRPE Timeout messages from the two servers. I have set the timeout on the server and the clients to timeout after 30 seconds, thinking that would fix it. But alas, we still get timeout messages. I haven't had much time to see what I can do to fix this, so for now our solution is to not email out warning messages from those two servers (I have the thresholds set enough where the critical messages still gives enough window to take care of the problem before going too critical) I have come to the conclusion that the servers are running the backup and eating up so much processing that the nrpe times out trying to connect and send information.
If you find a solution or even an idea to try, feel free to let me know and I'll give it a shot! Jayson Broughton Linux Systems Administrator True Computer Operations Dept. True Oil LLC -----Original Message----- From: Guillaume Rousse [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 08, 2008 4:44 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] NRPE way too fragile ? Hello list. I'm using nrpe quite heavily for testing lots of local service on all my machines. It work usually well, but seems a bit unreliable: too much often, nrpe itself fails to accept incoming connections, and test fails: CHECK_NRPE: Socket timeout after 10 seconds. stracing nrpe process shows it is probably waiting itself on another connection: [EMAIL PROTECTED] ~]# strace -p 22444 Process 22444 attached - interrupt to quit select(6, [5], NULL, [5], {0, 170000}) = 0 (Timeout) accept(5, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) It usually recovers itself alone, but that's enough to cause much unwanted notifications, even if all monitored services have nrpe itself as dependency. I'm using ssl encryption, as usually advised, but I'm planning shifting to plain-text connection (everything occurs on a distinc VLAN, without user access). Does everyone else has similar experience ? -- Guillaume Rousse Moyens Informatiques - INRIA Futurs Tel: 01 69 35 69 62 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null The information in this electronic mail message and any attached files is confidential and may be legally privileged. If you are not the intended recipient, delete this message and contact the sender immediately. Access to this message by anyone other than its intended recipient is unauthorized. You must not use or disseminate this information as it is proprietary property of the True companies. Communications on or through the True companies' computer systems may be monitored or recorded to secure effective system operation and for other lawful purposes. Thank you. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null