Marc Powell skrev: > On Feb 5, 2010, at 10:41 AM, Tony Johansson wrote: > > >> Hello, >> >> Our nagios 3.2.0 installation is having major problems. >> The nagios process dies silently about 10-60 seconds after beeing started. >> No record as to why in any logfiles. >> >> Have tried setting max debug (debug_level=-1 and debug_verbosity=2) in >> nagios.cfg - nothing. >> >> System is a CentOS release 5.4 which has been running fine for months. >> >> Any ideas on how to troubleshoot this or what is going on? >> > > > Try running it in the foreground (without -d). If you don't see anything > interesting when it dies, run it in the foreground through strace (strace > -fFs512 /path/to/nagios -c /path/to/nagios.cfg). > > Verify you haven't run out of disk space or anything simple like that. If > you're running SElinux, verify that there are no errors related to it in > /var/log/messages. > > Is there anything special about the install or the machine it's running on? > Are any of the nagios directories mounted from remote machines? > > -- > Marc > > Hello all,
Nothing special with the install, everything in the same machine. Ran strace as suggested: strace -fFs512 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg [pid 32731] write(3, "[1265393566.503713] [016.2] [pid=32731] Processed service performance data file output: 1265393559||AHS||C: Drive Space||c:\\ - total: 15.86 Gb - used: 7.60 Gb (48%) - free 8.26 Gb (52%)||c:\\ Used Space=7.60Gb;14.27;15.54;0.00;15.86\n", 232) = 232 [pid 32731] _llseek(3, 0, [657557], SEEK_CUR) = 0 [pid 32731] write(6, "1265393559||AHS||C: Drive Space||c:\\ - total: 15.86 Gb - used: 7.60 Gb (48%) - free 8.26 Gb (52%)||c:\\ Used Space=7.60Gb;14.27;15.54;0.00;15.86\n", 144) = -1 EFBIG (File too large) [pid 32731] --- SIGXFSZ (File size limit exceeded) @ 0 (0) --- [pid 32732] +++ killed by SIGXFSZ +++ "File size limit exceeded" seems to be the cause Disk space is plenty: df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 68G 28G 38G 43% / /dev/sda1 99M 30M 65M 32% /boot tmpfs 506M 0 506M 0% /dev/shm Also, I did try renaming retention.dat, status.dat and moving files out of checkresults earlier with no result. Seems like /var/spool/nagios/perfdata.log is 2G while /var/spool/nagios/perfdata.log is a mere 11K I've tried renaming the file and started nagios which now seems to run ok. Looks like I need to set up log rotation or what is the best way to handle perfdata.log? Many thanks, Tony ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null