[Nagios-users] Nagios caught SIGSEGV but doesn't seem to shut down all the way

2008-12-26 Thread Chris Beattie
Hello all,



I'm running Nagios 3.0.6 compiled from unmodified source on CentOS 5.2
x86_64.  I noticed that notifications stopped early this morning, and
the logs said Nagios caught SIGSEGV, and it was shutting down.  Nagios
doesn't appear to go all the way down, though.  All the CGIs still work,
but no checks are being performed.  There is a lock file, and nagios.cmd
still exists.  The first one I saw happened after Nagios had been
running fine for a while, but the same thing happens if I issue a
killall -SIGSEGV naigios command, defunct processes and all.  This is
what I got after I did the killall, then a service nagios start, then
another killall.



# ps -fC nagios

UIDPID  PPID  C STIME TTY  TIME CMD

nagios1469 1  0 10:47 ?00:00:00
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios1470  1469  0 10:47 ?00:00:00 [nagios] defunct

nagios1918 1  6 10:51 ?00:02:55
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios   16350  1918  0 11:25 ?00:00:00
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios   16351 16350  0 11:25 ?00:00:00 [nagios] defunct



Thanks to Paul Weaver's suggestion earlier this month, I've got a
failover Nagios server running.  Once a minute, it checks the primary
server.  I didn't set the conditions for failing over correctly, so it
didn't take over in this case, though it sometimes does for a moment
when I restart the primary Nagios after I've updated its object
configuration files.  The output of its check_nagios command looks like
this after the primary Nagios gets a SIGSEGV:



# ./check_by_ssh -H primaryhostname
--command='/usr/local/nagios/libexec/check_nagios
--filename=/usr/local/nagios/var/status.dat --expires=60
--command=nagios'

NAGIOS OK: 3 processes, status log updated 228 seconds ago



When I fixed the expiration, it gave me a warning state and I could've
failed over on that.  However, the way I did things, the failover server
thought everything was all right.  So, that's my problem to fix, but
shouldn't Nagios shut all the way down as well?



Thanks!

-Chris


Nothing in this message is intended to make or accept and offer or to form a 
contract, except that an attachment that is an image of a contract bearing the 
signature of an officer of our company may be or become a contract. This 
message (including any attachments) is intended only for the use of the 
individual or entity to whom it is addressed. It may contain information that 
is non-public, proprietary, privileged, confidential, and exempt from 
disclosure under applicable law or may constitute as attorney work product. If 
you are not the intended recipient, we hereby notify you that any use, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this message in error, please notify us immediately by 
telephone and delete this message immediately.

Thank you.
--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios caught SIGSEGV but doesn't seem to shut down all the way

2008-12-26 Thread Hugo van der Kooij
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chris Beattie wrote:

 I’m running Nagios 3.0.6 compiled from unmodified source on CentOS 5.2
 x86_64.  I noticed that notifications stopped early this morning, and
 the logs said Nagios caught SIGSEGV, and it was shutting down.  Nagios
 doesn’t appear to go all the way down, though.  All the CGIs still work,
 but no checks are being performed.  There is a lock file, and nagios.cmd
 still exists.  The first one I saw happened after Nagios had been
 running fine for a while, but the same thing happens if I issue a
 killall –SIGSEGV naigios command, defunct processes and all.  This is
 what I got after I did the killall, then a service nagios start, then
 another killall.

Well. Given that the nagios daemon is not the same thing as the CGI
binaries that make up your website this is to be expected.

Hugo.

- --
hvdko...@vanderkooij.org   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
Q: Are you sure?
A: Because it reverses the logical flow of conversation.
Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

Nid wyf yn y swyddfa ar hyn o bryd. Anfonwch unrhyw waith i'w gyfieithu.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAklVYvYACgkQBvzDRVjxmYGKkQCcD0M0Ty8AVVEfy0ag7n0LJf3+
S8cAoJ7n6zqVyrp+gEvDfIIW7XrA6yEO
=IP4B
-END PGP SIGNATURE-

--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null