No, not all checks. I see check_ping processes still firing up: [EMAIL PROTECTED] etc]# ps xauwwww -H| grep nagios | grep -v grep nagios 26676 11.0 0.1 28620 3852 ? Ssl 13:35 0:11 /usr/bin/nagios -d /etc/nagios/nagios.cfg nagios 26814 0.0 0.1 28624 3852 ? S 13:36 0:00 /usr/bin/nagios -d /etc/nagios/nagios.cfg nagios 26815 0.0 0.0 4684 640 ? S 13:36 0:00 /usr/lib/nagios/plugins/check_ping -H 172.28.7.59 -w 3000.0,80%% -c 5000.0,100%% -p 15 -t 30 nagios 26816 0.0 0.0 2580 528 ? S 13:36 0:00 /bin/ping -n -U -w 90 -c 15 172.28.7.59
I am seeing the same thing as you where only certain hosts/hostgroups are being checked and then all of a sudden everything stops BUT pings based on above but those checks are not being updated in nagios.log. Very weird. On 3/17/06, Eli Stair <[EMAIL PROTECTED]> wrote: > > So you're seeing the scenario where nagios stops _all_ checks > altogether? I've had this happen when the nagios parent process dies, > and logs to nagios.log to this effect "[1139362901] Caught SIGSEGV, > shutting down... ". I was getting these very frequently when I went > above some apparent host/service threshhold (went away when I removed > about 128 nodes at one point recently). In these cases the CGI's still > respond for some reason, which seemed inappropriate... > > I've also seen the same symptom, but without a well-advertised nagios > failure, where the process is still present in memory but checks aren't > executed and the CGI's are functional. > > The third related (and my current bane...) issue is where MOST all > checks occur, but some (sometimes large) groups of unrelated actions no > longer occur. Host/service checks as a whole seem to be working, but > I'll notice that I haven't gotten an alert for something that failed, > and then see that whole class of service checks on one hostgroup aren't > running anymore... and then start to see the same issue with other > checks/actions as well. > > I'd sure love to just have nagios start working again, as I'm strongly > against having to write an external framework for checking various parts > of Nagios and alerrt me when it's broken! Alternately, I've always kept > up to date on other OS monitor/alert frameworks and still nothing is as > extensible as Nagios is (yet). > > /eli > > > Terry wrote: > > In just looking at the logs, the status.log is being continuously > > updated as normal but when checks stop, the nagios.log stops gathering > > entries as well. > > > > On 3/17/06, Eli Stair <[EMAIL PROTECTED]> wrote: > > > >>I've been seeing this continuously in 2.0beta/rc/releases. For details > >>on my situation/posts check the devel/users archives, I'm curious if any > >>similarities exist. I haven't gotten acknowledgement/resolution on this > >>either, the only thing I've determined is that (in my case) stopping > >>nagios and restarting with the retention file zeroed resolves the issue > >>100%. > >> > >>In the case of having an extra nagios process running that can > >>definitely cause this and other issues. In my case that's never been > >>present and thus not the cause... > >> > >>/eli > >> > >>Terry wrote: > >> > >>>I am seeing this as well. I have services that do not get checked > >>>when they are scheduled: > >>> > >>>Last Check Type: ACTIVE > >>>Last Check Time: 03-17-2006 08:50:47 > >>>Status Data Age: 0d 1h 37m 51s > >>>Next Scheduled Active Check: 03-17-2006 10:09:01 > >>>Latency: 342.408 seconds > >>>Check Duration: 10.015 seconds > >>>Last State Change: 03-16-2006 11:55:02 > >>>Current State Duration: 0d 22h 33m 36s > >>> > >>>It is currently 10:29 and it still hasnt been checked. This is one of > >>>many examples. > >>> > >>>On 3/15/06, Matthias Eble > >>><[EMAIL PROTECTED]> wrote: > >>> > >>> > >>>>hi all! > >>>> > >>>>we are experiencing occassional problems with nagios 2.0 stable. The > >>>>main process was reloaded due to configuration changes yesterday (Mar > >>>>14th). since then ps -ef looks like this: > >>>> > >>>>nagios 1078 1 12 Mar09 ? 16:49:43 /opt/nagios/bin/nagios > >>>>-d /opt/nagios/etc/nagios.cfg > >>>>nagios 9431 1078 0 Mar14 ? 00:00:00 [nagios] <defunct> > >>>> > >>>>and nagios stopped to check. Has anyone an idea what could have happened > >>>>? The nagios.log and status.dat files have not been updated since then. > >>>> > >>>>thanks > >>>>matthias > >>>> > >>>> > >>>> > >>>>------------------------------------------------------- > >>>>This SF.Net email is sponsored by xPML, a groundbreaking scripting > >>>>language > >>>>that extends applications into web and mobile media. Attend the live > >>>>webcast > >>>>and join the prime developer group breaking into this new coding > >>>>territory! > >>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > >>>>_______________________________________________ > >>>>Nagios-users mailing list > >>>>Nagios-users@lists.sourceforge.net > >>>>https://lists.sourceforge.net/lists/listinfo/nagios-users > >>>>::: Please include Nagios version, plugin version (-v) and OS when > >>>>reporting any issue. > >>>>::: Messages without supporting info will risk being sent to /dev/null > >>>> > >>> > >>> > >>> > >>>------------------------------------------------------- > >>>This SF.Net email is sponsored by xPML, a groundbreaking scripting language > >>>that extends applications into web and mobile media. Attend the live > >>>webcast > >>>and join the prime developer group breaking into this new coding territory! > >>>http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 > >>>_______________________________________________ > >>>Nagios-users mailing list > >>>Nagios-users@lists.sourceforge.net > >>>https://lists.sourceforge.net/lists/listinfo/nagios-users > >>>::: Please include Nagios version, plugin version (-v) and OS when > >>>reporting any issue. > >>>::: Messages without supporting info will risk being sent to /dev/null > >>> > >> > >> > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 > > _______________________________________________ > > Nagios-users mailing list > > Nagios-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nagios-users > > ::: Please include Nagios version, plugin version (-v) and OS when > > reporting any issue. > > ::: Messages without supporting info will risk being sent to /dev/null > > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null