Re: [Nagios-users] Nagios checks and DNS queries

2008-12-03 Thread Steve Burton
Dirk H. Schulz wrote:
 Hi folks,

 I have two servers running Nagios, one is 2.3.1 on Debian, the other 3.0.5 
 on CentOS. With both I have a peculiar problem:

 Both of the servers have 3 different nameserves in /etc/resolv.conf, but 
 when the first nameserver fails, then more than half of the service checks 
 fail (plugin timed out). The failure does not occur just shortly, but 
 takes as long as the first nameserver is not running.
 This first nameserver in /etc/resolv.conf is not the master nameserver (all 
 of them are slaves), so it is not a problem of the slave stopping answering 
 when the master fails or any misconfiguration between the nameservers.

 This should not be occuring, but it can be reproduced reliably. Now I hope 
 that there is some configuration item I overlooked, but googling did not 
 deliver any hint.

 Any help is appreciated.

 Dirk

   
Dirk,

my solution was to run a slave name server on the Nagios server itself , 
restricted to only answer queries from localhost.

Steve.

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 Build the coolest Linux based applications with Moblin SDK  win great prizes
 Grand prize is a trip for two to an Open Source event anywhere in the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

   


__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios checks and DNS queries

2008-12-03 Thread Sean McAfee
Steve Burton wrote:
 Dirk,

 my solution was to run a slave name server on the Nagios server itself , 
 restricted to only answer queries from localhost.

 Steve.

Why not set something like options timeout:1 attempts:1 in 
resolv.conf?  From man resolv.conf:*

timeout:*/n/
sets the amount of time the resolver will wait for a response from a 
remote name server before retrying the query via a different name 
server. Measured in seconds, the default is RES_TIMEOUT (currently 5, 
see resolv.h http://linux.die.net/include/resolv.h).*

attempts:*/n/
sets the number of times the resolver will send a query to its name 
servers before giving up and returning an error to the calling 
application. The default is RES_DFLRETRY (currently 2, see resolv.h 
http://linux.die.net/include/resolv.h).

With the defaults, you're looking at 10 seconds (2 attempts, 5s apart) 
before it moves onto the next server.  Since 10 seconds is the default 
timeout for those checks, you'll always hit a timeout unless the DNS 
server becomes responsive again.

-- 
Sean McAfee
System Engineer

Collaborative Fusion, Inc.
 [EMAIL PROTECTED]
 412-422-3463 x 4025

5849 Forbes Avenue
Pittsburgh, PA 15217


IMPORTANT: This message contains confidential information
and is intended only for the individual named. If the reader of
this message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.





IMPORTANT: This message contains confidential information and is intended only 
for the individual named. If the reader of this message is not an intended 
recipient (or the individual responsible for the delivery of this message to an 
intended recipient), please be advised that any re-use, dissemination, 
distribution or copying of this message is prohibited. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system.



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios checks and DNS queries

2008-12-03 Thread Steve Burton
Sean McAfee wrote:
 Steve Burton wrote:
   
 Dirk,

 my solution was to run a slave name server on the Nagios server itself , 
 restricted to only answer queries from localhost.

 Steve.

 
 Why not set something like options timeout:1 attempts:1 in 
 resolv.conf?  From man resolv.conf:*

 timeout:*/n/
 sets the amount of time the resolver will wait for a response from a 
 remote name server before retrying the query via a different name 
 server. Measured in seconds, the default is RES_TIMEOUT (currently 5, 
 see resolv.h http://linux.die.net/include/resolv.h).*

 attempts:*/n/
 sets the number of times the resolver will send a query to its name 
 servers before giving up and returning an error to the calling 
 application. The default is RES_DFLRETRY (currently 2, see resolv.h 
 http://linux.die.net/include/resolv.h).

 With the defaults, you're looking at 10 seconds (2 attempts, 5s apart) 
 before it moves onto the next server.  Since 10 seconds is the default 
 timeout for those checks, you'll always hit a timeout unless the DNS 
 server becomes responsive again.

   
Sean,

I reason I set up the slave server was so my nagios instance could 
monitor the 'real' DNS servers by name and check the host and other 
services on those hosts (they're Windows DCs) even if (or especially if) 
the DNS service had failed.

Steve.


__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios checks and DNS queries

2008-12-03 Thread Sean McAfee
Steve Burton wrote:
 Sean,

 I reason I set up the slave server was so my nagios instance could 
 monitor the 'real' DNS servers by name and check the host and other 
 services on those hosts (they're Windows DCs) even if (or especially 
 if) the DNS service had failed.
That makes sense.  We make use of the options in resolv.conf to allow us 
to reboot the DNS servers without causing any interruptions.

For Dirk though, this isn't so much a Nagios or monitoring issue as it 
is a general *nix one.  The real question here is why the DNS servers 
become unavailable so frequently.

-- 
Sean McAfee
System Engineer

Collaborative Fusion, Inc.
 [EMAIL PROTECTED]
 412-422-3463 x 4025

5849 Forbes Avenue
Pittsburgh, PA 15217


IMPORTANT: This message contains confidential information
and is intended only for the individual named. If the reader of
this message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.





IMPORTANT: This message contains confidential information and is intended only 
for the individual named. If the reader of this message is not an intended 
recipient (or the individual responsible for the delivery of this message to an 
intended recipient), please be advised that any re-use, dissemination, 
distribution or copying of this message is prohibited. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system.



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios checks and DNS queries

2008-12-02 Thread Dirk H. Schulz
Hi folks,

I have two servers running Nagios, one is 2.3.1 on Debian, the other 3.0.5 
on CentOS. With both I have a peculiar problem:

Both of the servers have 3 different nameserves in /etc/resolv.conf, but 
when the first nameserver fails, then more than half of the service checks 
fail (plugin timed out). The failure does not occur just shortly, but 
takes as long as the first nameserver is not running.
This first nameserver in /etc/resolv.conf is not the master nameserver (all 
of them are slaves), so it is not a problem of the slave stopping answering 
when the master fails or any misconfiguration between the nameservers.

This should not be occuring, but it can be reproduced reliably. Now I hope 
that there is some configuration item I overlooked, but googling did not 
deliver any hint.

Any help is appreciated.

Dirk

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null