FYI, in the time it's taking to wait for nagios to start polling
anything after starting it up I decided to look at what it's doing...
This would explain why it starts up and sits around not consuming any
cycles but not polling. Sleep left in the code? These entries in the
log each come afer a few minutes (119 and 175 seconds apart) each..
This is running on 2.0b6, x86_64 arch, compiled from source with perlcache.
/eli
###FILE: nagios.log:
[1134076786] Finished daemonizing... (New PID=11914)
[1134076905] service_result_worker_thread(): poll(): EINTR (impossible)
[1134077080] service_result_worker_thread(): poll(): EINTR (impossible)
### GDB info:
Attaching to program: /usr/local/nagios/bin/nagios, process 11914
Reading symbols from
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/libperl.so...(no
debugging symbols found)...done.
Loaded symbols for
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/libperl.so
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libutil.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/tls/libpthread.so.0...
(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 182894164416 (LWP 11914)]
[New Thread 1094719840 (LWP 11917)]
[New Thread 1084229984 (LWP 11915)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/tls/libc.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libltdl.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libltdl.so.3
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000364700b9c5 in __nanosleep_nocancel ()
from /lib64/tls/libpthread.so.0
(gdb) where
#0 0x000000364700b9c5 in __nanosleep_nocancel () from
/lib64/tls/libpthread.so.0
#1 0x00000000004209aa in event_execution_loop ()
#2 0x000000000040efa0 in main ()
(gdb) info registers
rax 0xfffffffffffffdfc -516
rbx 0x861bb0 8788912
rcx 0xffffffffffffffff -1
rdx 0x2 2
rsi 0x0 0
rdi 0x7fbffff450 548682069072
rbp 0x0 0x0
rsp 0x7fbffff410 0x7fbffff410
r8 0x0 0
r9 0x2e8a 11914
r10 0x7fbffff301 548682068737
r11 0x202 514
r12 0x7fbffff450 548682069072
r13 0xffffffff 4294967295
r14 0xffffffff 4294967295
r15 0x7fbffffa08 548682070536
rip 0x364700b9c5 0x364700b9c5 <__nanosleep_nocancel+60>
eflags 0x202 514
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Fred wrote:
I do the same thing with check_icmp except that I use sudo and create
a simple sudo entry like (see the CHECK_ICMP):
Cmnd_Alias CHECKALLSSHKEYS = /opt/hptc/nagios/libexec/check_keys #
HP-HPTC-KeySync
Cmnd_Alias CHECKSYSLOGALERTS =
/opt/hptc/nagios/libexec/check_syslogalerts # HP-HPTC-SysLog
Cmnd_Alias CHECKSFS = /opt/hptc/nagios/libexec/check_sfs # HP-HPTC-SysLog
Cmnd_Alias CHECKLSF = /opt/hptc/nagios/libexec/check_lsf # HP-HPTC-CheckLSF
Cmnd_Alias CHECKICMP = /opt/hptc/nagios/libexec/check_icmp #
HP-HPTC-CheckICMP
nagios ALL = NOPASSWD:
CHECKALLSSHKEYS,CHECKSYSLOGALERTS,CHECKSFS,CHECKLSF,CHECKICMP #
HP-HPTC-Nagios
I just built the 2.0b5 and hope to give it a try in the next few days on a
700+ node system ... I am hoping that this *solves* the delay problem
that existed in the previous releases.
-FredC
*/Eli Stair <[EMAIL PROTECTED]>/* wrote:
I'm running a fresh build of 2.0b5 on x86_64. After an initial start of
nagios, it can take up to 10 minutes for the first host or service
checks to begin. There is no CPU load by the nagios process during this
time. I have over 1000 hosts to check, and have reduced the max
host/service check spread in order to ensure that it is not "evening"
out the time.
This problem is NOT occuring on a 2.0b3 build, with the same exact
configuration.
After the checks DO start, it can take hours to finish. I've changed
the user to root so that I can have the host check be check_icmp -t
1 -p
1.
Unfortunately, even with this situation, having anywhere between 4 and
64 hosts go down can make the "monitoring" aspect effectively useless.
Any suggestions on the problem of startup lag?
Any ways to further speed up the host check runs, aside from using
check_icmp?
Thanks,
/eli
### inline nagios.cfg:
[EMAIL PROTECTED] etc]# cat nagios.cfg | egrep -v "^#|^$"
log_file=/var/log/nagios/nagios.log
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_dir=/usr/local/nagios/etc/config
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/customcommands.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
object_cache_file=/usr/local/nagios/var/objects.cache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.dat
nagios_user=root
nagios_group=root
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
comment_file=/usr/local/nagios/var/comments.dat
downtime_file=/usr/local/nagios/var/downtime.dat
lock_file=/usr/local/nagios/var/nagios.lock
temp _file=/usr/local/nagios/var/nagios.tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/var/log/nagios/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=15
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=10
max_concurrent_checks=0
service_reaper_frequency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/retention.dat
retention_update_interval=0
use_retained_program_state=1
use_retained_scheduling_info=0
interv al_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=0
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
check_for_orphaned_services=0
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=iso8601
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=nagios
admin_pager=pagenagios
daemon_dumps_core=0
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through
log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null