Well,
I advise you to check the firewall-logging. I had similar problems
before. Because I have a lot of distributed servers (23) who manage 5000
services in total, there are a lot of connections to port 5667 on the
central server and sometimes the firewall thinks that this is an attack
and blocks the traffic. So the nsca-communcation is left unterminated
and nsca is 'hanging' on your central server.
Frederik
Chris Goosen wrote:
Yes, I have an ISA 2004 (if you can call ISA a firewall!!) between the 2
servers
-----Original Message-----
From: Frederik Vanhee [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 18, 2006 10:37 PM
To: basile au siris
Cc: Chris Goosen; nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] nsca / distributed monitoring result problem
Hello,
is there a firewall between the central server and the distributed
server ?
Frederik
basile au siris wrote:
hi
maybe i have the same problem
i have distributed monitoring and the central server sometimes freeze
and i just
have to reboot it ( electric )
i suspect nsca ( or hardware problem ) because sometimes i note there
are many ( 50 )
nsca process and if i restart it all become normal again
hope we solve our problem
basile
Chris Goosen wrote:
Hello all..
I am running my nagios central server on an HP 2.4ghz with 512mb ram.
At present, I am monitoring 65 hosts with approx. 400 services.
After a reboot, everything works perfectly, but the longer my server
runs, the more sluggish it gets and eventually the nsca processes
consume all the memory and the server stops responding. What also
happens it that I start getting hosts that are reported as down even
though they have the correct ping response.. the error says "PLUGIN
TIMED OUT after 10 seconds"
Here is an example of what I mean:
Host State Information
Host Status:
DOWN
Status Information:
CRITICAL - Plugin timed out after 10 seconds
Last Status Check:
01-16-2006 12:06:28
Status Data Age:
0d 0h 2m 57s
Last State Change:
01-16-2006 10:20:44
Current State Duration:
0d 1h 48m 41s
Last Host Notification:
01-16-2006 10:20:44
Current Notification Number:
2
Is This Host Flapping?
N/A
OK 01-16-2006 12:05:47 63d 19h 30m 59s 1/3 PING OK - Packet loss =
0%, RTA = 0.42 ms
I assume that these are related and that the lack of memory caused
this problem, would an upgrade to from nagios 1.2 to nagios 1.3 fix
this? If so, what is the best way to perform that upgrade?
my /etc/xinetd.d/nsca file :
# default: on
# description: NSCA
service nsca
{
flags = REUSE
socket_type = stream
wait = no
user = nagios
group = nagios
server = /usr/sbin/nsca
server_args = -c /home/e-smith/nagios/nsca.cfg --inetd
cps = 9000 30
instances = UNLIMITED
log_on_failure += USERID
disable = no
only_from = ip1, ip2, ip3, etc..
}
command_check_interval= -1
System info:
SME server 6.01 (2.4.20-18.7, i686)
Perl v5.6.1
Apache/1.3.27
Nagios 1.2
Any advice would be great... thanks.
Chris
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD
SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. ::: Messages without supporting info will risk
being sent to /dev/null
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null