Hi List,
I'm running Nagios 3.2.0 compiled from source on SLES 10
SP2
The installation is pretty vanilla except that we make use of
livestatus mk.
We experience quite regularly a Nagios crash...
At first
glance it looks like Nagios works as it should, the cgi's respond but no
more jobs are scheduled and the # of hosts and services is at zero.
I was
wondering if there are any ppl who have/had the same experience?
Any tips
and feedback would be appreciated..
This is how our logfile looks like (I
have replaced hostnames by XXXXXXXXXX):
...snip....
[1272660915]
livestatus: Query: Filter: name = be-gen-wap-02
[1272660915] livestatus:
Query: OutputFormat:json
[1272660915] livestatus: Query: KeepAlive: on
[1272660915] livestatus: Query: ResponseHeader: fixed16
[1272660915]
livestatus: Time to process request: 81 us. Size of answer: 257 bytes
[1272660915] livestatus: Query: GET services
[1272660915] livestatus:
Query: Filter: host_name = XXXXXXXXXX)
[1272660915] livestatus: Query:
Columns: description display_name state host_alias host_address
plugin_output notes last_check next_check state_type current_attempt
max_check_attempts last_state_change last_hard_state_change perf_data
scheduled_downtime_depth acknowledged host_acknowledged
host_scheduled_downtime_depth has_been_checked
[1272660915] livestatus:
Query: OutputFormat:json
[1272660915] livestatus: Query: KeepAlive: on
[1272660915] livestatus: Query: ResponseHeader: fixed16
[1272660915]
livestatus: Time to process request: 10 us. Size of answer: 312 bytes
[1272660915] livestatus: Query: GET hosts
[1272660915] livestatus: Query:
Columns: state plugin_output alias display_name address notes last_check
next_check state_type current_attempt max_check_attempts last_state_change
last_hard_state_change statusmap_image perf_data acknowledged
scheduled_downtime_depth has_been_checked state
[1272660915] livestatus:
Query: Filter: name = be-gen-wap-01
[1272660915] livestatus: Query:
OutputFormat:json
[1272660915] livestatus: Query: KeepAlive: on
[1272660915] livestatus: Query: ResponseHeader: fixed16
[1272660915]
livestatus: Time to process request: 81 us. Size of answer: 257 bytes
[1272660915] livestatus: Query: GET services
[1272660915] livestatus:
Query: Filter: host_name = XXXXXXXXXX)
[1272660915] livestatus: Query:
Columns: description display_name state host_alias host_address
plugin_output notes last_check next_check state_type current_attempt
max_check_attempts last_state_change last_hard_state_change perf_data
scheduled_downtime_depth acknowledged host_acknowledged
host_scheduled_downtime_depth has_been_checked
[1272660915] livestatus:
Query: OutputFormat:json
[1272660915] livestatus: Query: KeepAlive: on
[1272660915] livestatus: Query: ResponseHeader: fixed16
[1272660915]
livestatus: Time to process request: 9 us. Size of answer: 312 bytes
[1272660939] SERVICE ALERT: XXXXXXXXXX;Procs -
Default;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 30 seconds.
[1272660959] SERVICE ALERT: XXXXXXXXXX;Procs -
Default;OK;SOFT;2;CHECK_PROCS_MULTI OK - all processes OK
[1272660969]
SERVICE ALERT: XXXXXXXXXX;CPU - Usage;OK;SOFT;2;OK - CPU0 'Load Percentage'
= 4: OK - _Total 'Load Percentage' = 4:
[1272660979] SERVICE ALERT:
XXXXXXXXXX;CPU - Usage;OK;SOFT;3;OK - CPU0 'Load Percentage' = 30: OK -
CPU1 'Load Percentage' = 2: OK - _Total 'Load Percentage' = 16:
[1272660999] SERVICE ALERT: XXXXXXXXXX;CPU - Usage;UNKNOWN;SOFT;2;Unknown -
LoadPercentage cannot be determined
[1272661034] livestatus: Query: GET
hosts
[1272661034] livestatus: Query: Columns: childs
[1272661034] Caught
SIGSEGV, shutting down...
The Debug file looks like
this:
[1272661033.237415] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.237436] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.239257] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.239289] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.239297] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.239356] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.239362] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.241357] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.241390] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.241399] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.241433] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.241438] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.243672] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.243719] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.243730] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.243778] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.243785] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.245368] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.245401] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.245410] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.245445] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.245449] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.248076] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.248111] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.248120] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.248182] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.248187] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.249484] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.249514] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.249523] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.249566] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.249571] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.251416] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.505428] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.757384] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.009381] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.009432] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.009442] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.009510] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.009515] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.011394] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.011426] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.011435] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.011482] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.011487] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.013339] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.013376] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.013385] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.013454] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.013459] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.014324] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.014352] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.014361] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.014409] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.014414] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.017539] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.017579] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.017589] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.017679] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.017687] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.019850] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.019903] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.019916] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.019951] [064.1] [pid=9031] Making callbacks (type 9)...
[1272661034.019967] [064.2] [pid=9031] Callback #1 (type 9) return code =
0
[1272661034.020002] [064.1] [pid=9031] Making callbacks (type 13)..
--
Jelle Smet
http://www.smetj.net
[1272661033.237415] [064.1] [pid=9031]
Making callbacks (type 13)...
[1272661033.237436] [064.2] [pid=9031]
Callback #1 (type 13) return code = 0
[1272661033.239257] [064.1]
[pid=9031] Making callbacks (type 8)...
[1272661033.239289] [064.1]
[pid=9031] Making callbacks (type 13)...
[1272661033.239297] [064.2]
[pid=9031] Callback #1 (type 13) return code = 0
[1272661033.239356]
[064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.239362]
[064.2] [pid=9031] Callback #1 (type 13) return code = 0
[1272661033.241357] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.241390] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.241399] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.241433] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.241438] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.243672] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.243719] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.243730] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.243778] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.243785] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.245368] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.245401] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.245410] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.245445] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.245449] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.248076] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.248111] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.248120] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.248182] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.248187] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.249484] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.249514] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.249523] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.249566] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661033.249571] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661033.251416] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.505428] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661033.757384] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.009381] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.009432] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.009442] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.009510] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.009515] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.011394] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.011426] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.011435] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.011482] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.011487] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.013339] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.013376] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.013385] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.013454] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.013459] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.014324] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.014352] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.014361] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.014409] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.014414] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.017539] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.017579] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.017589] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.017679] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.017687] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.019850] [064.1] [pid=9031] Making callbacks (type 8)...
[1272661034.019903] [064.1] [pid=9031] Making callbacks (type 13)...
[1272661034.019916] [064.2] [pid=9031] Callback #1 (type 13) return code =
0
[1272661034.019951] [064.1] [pid=9031] Making callbacks (type 9)...
[1272661034.019967] [064.2] [pid=9031] Callback #1 (type 9) return code =
0
[1272661034.020002] [064.1] [pid=9031] Making callbacks (type 13)..
------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null