Re: [Nagios-users] Scheduling Queue stucked a few minutes after restart
hi Chris, thanks for your reply.. I just upgraded to nagios 3.2.1-2~bpo50+1, but nothing has changed :'( -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Scheduling Queue stucked a few minutes after restart
hello, I have a really odd issue running Nagios: a few minutes after starting the scheduling queue seems to freeze and no more active checks are performed. The queue remains stucked for hours until I have to manually restart Nagios. Passive checks are processed normally. I'm running Nagios 3.0.6 (deb package) on a Debian lenny system. The harware is an 8-core Xeon CPU with 16GB RAM. Nagios is monitoring about 1K hosts and 10K services. Reverting back the configuration to last known good configuration did not help, neither did rebooting the server and several Nagios restarts and reloads. Already tried fixes: - disabled all active hosts checks - increased ulimit for nagios user - disabled all event handlers - disabled all obsess stuff Any help or hint would be appreciated. nagios.cfg follows *** log_file=/nagios_fe/var/log/nagios3/nagios.log cfg_file=/etc/nagios3/commands.cfg cfg_dir=/etc/nagios-plugins/config cfg_dir=/nagios_fe/etc/cmon/nagios3 cfg_dir=/nagios_fe/etc/nagiosgrapher/nagios3 object_cache_file=/nagios_fe/var/cache/nagios3/objects.cache precached_object_file=/nagios_fe/var/lib/nagios3/objects.precache resource_file=/nagios_fe/etc/cmon/nagios3/macros.res status_file=/nagios_fe/var/cache/nagios3/status.dat status_update_interval=10 nagios_user=nagios nagios_group=nagios check_external_commands=1 command_check_interval=-1 command_file=/nagios_fe/var/lib/nagios3/rw/nagios.cmd external_command_buffer_slots=4096 lock_file=/nagios_fe/var/run/nagios3/nagios3.pid temp_file=/nagios_fe/var/cache/nagios3/nagios.tmp temp_path=/tmp event_broker_options=-1 log_rotation_method=d log_archive_path=/nagios_fe/var/log/nagios3/archives use_syslog=0 log_notifications=1 log_service_retries=0 log_host_retries=0 log_event_handlers=1 log_initial_states=0 log_external_commands=1 log_passive_checks=0 service_inter_check_delay_method=s max_service_check_spread=30 service_interleave_factor=s host_inter_check_delay_method=s max_host_check_spread=30 max_concurrent_checks=0 check_result_reaper_frequency=10 max_check_result_reaper_time=30 check_result_path=/nagios_fe/var/lib/nagios3/spool/checkresults max_check_result_file_age=3600 cached_host_check_horizon=15 cached_service_check_horizon=15 enable_predictive_host_dependency_checks=1 enable_predictive_service_dependency_checks=1 soft_state_dependencies=0 auto_reschedule_checks=0 auto_rescheduling_interval=30 auto_rescheduling_window=180 sleep_time=0.25 service_check_timeout=60 host_check_timeout=30 event_handler_timeout=30 notification_timeout=30 ocsp_timeout=5 perfdata_timeout=5 retain_state_information=1 state_retention_file=/nagios_fe/var/lib/nagios3/retention.dat retention_update_interval=60 use_retained_program_state=1 use_retained_scheduling_info=1 retained_host_attribute_mask=0 retained_service_attribute_mask=0 retained_process_host_attribute_mask=0 retained_process_service_attribute_mask=0 retained_contact_host_attribute_mask=0 retained_contact_service_attribute_mask=0 interval_length=60 use_aggressive_host_checking=0 execute_service_checks=1 accept_passive_service_checks=1 execute_host_checks=1 accept_passive_host_checks=1 enable_notifications=1 enable_event_handlers=0 process_performance_data=1 service_perfdata_file=/nagios_fe/var/lib/nagiosgrapher/ngraph.pipe service_perfdata_file_template=$HOSTNAME$\t$SERVICEDESC$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\t$TIMET$\n service_perfdata_file_mode=a service_perfdata_file_processing_interval=5 service_perfdata_file_processing_command=ngraph-process-service-perfdata-pipe obsess_over_services=0 obsess_over_hosts=0 translate_passive_host_checks=0 passive_host_checks_are_soft=0 check_for_orphaned_services=1 check_for_orphaned_hosts=1 check_service_freshness=1 service_freshness_check_interval=60 check_host_freshness=0 host_freshness_check_interval=60 additional_freshness_latency=15 enable_flap_detection=1 low_service_flap_threshold=5.0 high_service_flap_threshold=20.0 low_host_flap_threshold=5.0 high_host_flap_threshold=20.0 date_format=euro p1_file=/usr/lib/nagios3/p1.pl enable_embedded_perl=0 use_embedded_perl_implicitly=1 illegal_object_name_chars=`~!$%^*|'?,()= illegal_macro_output_chars=`~$|' use_regexp_matching=0 use_true_regexp_matching=0 admin_email=r...@localhost admin_pager=pager...@localhost daemon_dumps_core=0 use_large_installation_tweaks=1 enable_environment_macros=0 debug_level=144 debug_verbosity=1 debug_file=/nagios_fe/var/log/nagios3/nagios.debug max_debug_file_size=20 *** -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net
[Nagios-users] NRPE vs NSCA benchmarking
Hi everybody, has anyone of you ever tried to benchmark NRPE and NSCA performance? It's acknowledged that passive checks (NSCA) move the load from Nagios server to monitored machines, so from a Nagios point of view they need less resources. But how much less? I'm trying to design a distributed monitoring setup to hold a total amount of 15-20K services, so I have: - 1 master Nagios server with ALL 15K services (passive) - an undefined number of slave server each monitoring a portion of the 15K services The slaves perform the checks and report the result to the master. This is the point: how should the slave perform the checks? NRPE or NSCA? I thought about some pros and cons, here is the result: NRPE PROS: centralized configuration, less error prone NRPE CONS: centralized load causes limited scalability NSCA PROS/CONS: the opposite Any other suggestion or benchmarking tool/approach would be appreciated. Bye Maurizio Pinotti - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null