Re: [Nagios-users] Scheduling Queue stucked a few minutes after restart

2010-12-30 Thread Maurizio Pinotti
hi Chris, thanks for your reply.. I just upgraded to nagios 3.2.1-2~bpo50+1, but
nothing has changed :'(


--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Scheduling Queue stucked a few minutes after restart

2010-12-28 Thread Maurizio Pinotti
hello,

I have a really odd issue running Nagios: a few minutes after starting the
scheduling queue seems to freeze and no more active checks are performed. The
queue remains stucked for hours until I have to manually restart Nagios.

Passive checks are processed normally.

I'm running Nagios 3.0.6 (deb package) on a Debian lenny system. The harware is
an 8-core Xeon CPU with 16GB RAM. Nagios is monitoring about 1K hosts and 10K
services.

Reverting back the configuration to last known good configuration did not
help, neither did rebooting the server and several Nagios restarts and reloads.

Already tried fixes:

- disabled all active hosts checks
- increased ulimit for nagios user
- disabled all event handlers
- disabled all obsess stuff



Any help or hint would be appreciated.











nagios.cfg follows

***

log_file=/nagios_fe/var/log/nagios3/nagios.log
cfg_file=/etc/nagios3/commands.cfg
cfg_dir=/etc/nagios-plugins/config
cfg_dir=/nagios_fe/etc/cmon/nagios3
cfg_dir=/nagios_fe/etc/nagiosgrapher/nagios3
object_cache_file=/nagios_fe/var/cache/nagios3/objects.cache
precached_object_file=/nagios_fe/var/lib/nagios3/objects.precache
resource_file=/nagios_fe/etc/cmon/nagios3/macros.res
status_file=/nagios_fe/var/cache/nagios3/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/nagios_fe/var/lib/nagios3/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/nagios_fe/var/run/nagios3/nagios3.pid
temp_file=/nagios_fe/var/cache/nagios3/nagios.tmp
temp_path=/tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/nagios_fe/var/log/nagios3/archives
use_syslog=0
log_notifications=1
log_service_retries=0
log_host_retries=0
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=0
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/nagios_fe/var/lib/nagios3/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/nagios_fe/var/lib/nagios3/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=0
process_performance_data=1
service_perfdata_file=/nagios_fe/var/lib/nagiosgrapher/ngraph.pipe
service_perfdata_file_template=$HOSTNAME$\t$SERVICEDESC$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\t$TIMET$\n
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=5
service_perfdata_file_processing_command=ngraph-process-service-perfdata-pipe
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=euro
p1_file=/usr/lib/nagios3/p1.pl
enable_embedded_perl=0
use_embedded_perl_implicitly=1
illegal_object_name_chars=`~!$%^*|'?,()=
illegal_macro_output_chars=`~$|'
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=r...@localhost
admin_pager=pager...@localhost
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=0
debug_level=144
debug_verbosity=1
debug_file=/nagios_fe/var/log/nagios3/nagios.debug
max_debug_file_size=20

***


--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net

Re: [Nagios-users] Scheduling Queue stucked a few minutes after restart

2010-12-28 Thread Chris Beattie
Maurizio Pinotti wrote:
 I have a really odd issue running Nagios: a few minutes after starting the
 scheduling queue seems to freeze and no more active checks are performed. The
 queue remains stucked for hours until I have to manually restart Nagios.
 
 I'm running Nagios 3.0.6 (deb package) on a Debian lenny system. The harware 
 is

I think that is a bug in that version of Nagios.  I had the same 
problem.  It got fixed, but I still go look at my service checks every 
morning to make sure.  Also, I see where the server guys acknowledge 
problems and then forget about them, heh heh.

There is a much newer version of Nagios available in lenny-backports.  I 
would give it a shot if you can.

http://packages.debian.org/source/lenny-backports/backports/nagios3

-- 
-Chris

--


Nothing in this message is intended to make or accept an offer or to form a 
contract, except that an attachment that is an image of a contract bearing the 
signature of an officer of our company may be or become a contract. This 
message (including any attachments) is intended only for the use of the 
individual or entity to whom it is addressed. It may contain information that 
is non-public, proprietary, privileged, confidential, and exempt from 
disclosure under applicable law or may constitute as attorney work product. If 
you are not the intended recipient, we hereby notify you that any use, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this message in error, please notify us immediately by 
telephone and delete this message immediately.

Thank you.


--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null