Hi Sascha,
It seems that for every host, 3 processes are launched to do the host ping
check: sh, ping, and nagios. I currently have ~57 hosts that are in an down
state and have been acknowledged as out of service. I would assume 57*3 plus
the 30 second timeout could cause this many processes at the same time.
I guess that brings me to my next question. I could disable active host checks
for these out of service machines which would most likely alleviate my warnings
about the amount of processes, but would I have to re-enable them once the
machines are brought back up? I currently just acknowledge the problem and
leave a comment when a machine is put out of service, but this means that it
will be back at some point. When it does come back, acknowledgement is gone
and regular checks are still happening. Does anyone know of a better way to do
this?
Thanks so much,
Ryan Gravlin
________________________________
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, September 02, 2008 11:01 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Antwort: Default Nagios process self-check
[EMAIL PROTECTED] schrieb am 02.09.2008 15:37:47:
> # of Hosts Monitored: 322
> # of Services Monitored: 35
>
> The localhost.cfg comes with a default process check with the values
> 250+ for warnings and 400+ for critical. Usually about twice an
> hour from checking the event log I get this message:
>
> [09-02-2008 07:02:48] SERVICE ALERT: NAGIOS;Total Processes;WARNING;
> SOFT;1;PROCS WARNING: 370 processes with STATE = RSZDT
>
> It seems to me the machine itself is powerful enough to execute this
> many checks without even breaking a sweat. Were these defaults
> configured in the thinking that there should never be that many processes?
>
> I'm by no means a Linux or Nagios expert and I was hoping someone
> could explain more of the thinking behind this check than what I
> see. I can obviously just bump the numbers up but I want to make
> sure that I'm not ignoring something obvious that may have unwanted
> results after the fact. Should I use these numbers I see here as
> the basis for my new thresholds?
These thresholds were never meant to be any upper limit, the maximum number
of concurrent checks your box can handle solely depends on your hardware.
See it more like an "if you have a nagios installation which produces
that many concurrent checks - then you should know by now how to
change this behaviour" ;-)
But then - I fail to see how your setup with 322 host and only 35(?) service
checks could produce that many processes. Maybe it'll be a good idea to
doublecheck what's going on there.
S
GFKL Financial Services AG
Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr.
Tom Haverkamp
Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma
Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null