On Thu, Mar 22, 2012 at 8:30 PM, Dan Frincu <[email protected]> wrote: > Hi, > > On Wed, Mar 21, 2012 at 7:35 PM, Christoph Bartoschek > <[email protected]> wrote: >> Hi, >> >> after the incident yesterday we got everything up again. However since >> then ocf:pacemaker:ClusterMon sends a mail every 15 minutes although >> everything is fine. >> >> What could be wrong? > > My guess is that the default recheck interval is 15 minutes and that > also triggers ClusterMon to send an email at that interval. > > IIRC, ClusterMon sends an email per each event (someone can correct me > on this if I'm wrong),
Yeah, but unless the PE moves things around there will be no events and no email to send. > which got me to receive ~300 emails when a > failover of a group (~26 resources in total) took place. I've not > managed to make it send less email, but maybe someone else can shed > some light onto this. logs with -VVVVV appended to extra_options (so that we get debug from crm_mon)? > > HTH, > Dan > >> >> Here is the configuration of ClusterMon: >> >> primitive mail ocf:pacemaker:ClusterMon \ >> op monitor interval="180" timeout="20" \ >> params extra_options="--mail-to admin" htmlfile="/tmp/crm_mon.html" >> >> >> The logfile says: >> >> Mar 21 18:30:30 ries lrmd: [10225]: info: rsc:mail:1:142: monitor >> Mar 21 18:30:32 ries crmd: [10228]: info: crm_timer_popped: PEngine >> Recheck Timer (I_PE_CALC) just popped! >> Mar 21 18:30:32 ries crmd: [10228]: info: do_state_transition: State >> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC >> cause=C_TIMER_POPPED origin=crm_timer_popped ] >> Mar 21 18:30:32 ries crmd: [10228]: info: do_state_transition: >> Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED >> Mar 21 18:30:32 ries crmd: [10228]: info: do_state_transition: All 2 >> cluster nodes are eligible to run resources. >> Mar 21 18:30:32 ries crmd: [10228]: info: do_pe_invoke: Query 435: >> Requesting the current CIB: S_POLICY_ENGINE >> Mar 21 18:30:32 ries crmd: [10228]: info: do_pe_invoke_callback: >> Invoking the PE: query=435, ref=pe_calc-dc-1332351032-363, seq=2004, >> quorate=1 >> Mar 21 18:30:32 ries pengine: [10227]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Mar 21 18:30:32 ries pengine: [10227]: info: unpack_config: Node scores: >> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 >> Mar 21 18:30:32 ries pengine: [10227]: info: determine_online_status: >> Node laplace is online >> Mar 21 18:30:32 ries pengine: [10227]: info: find_clone: Internally >> renamed mail:1 on laplace to mail:2 (ORPHAN) >> Mar 21 18:30:32 ries pengine: [10227]: info: determine_online_status: >> Node ries is online >> Mar 21 18:30:32 ries pengine: [10227]: notice: unpack_rsc_op: Operation >> p_lsb_nfsserver:0_monitor_0 found resource p_lsb_nfsserver:0 active on ries >> Mar 21 18:30:32 ries pengine: [10227]: info: find_clone: Internally >> renamed mail:0 on ries to mail:1 >> Mar 21 18:30:32 ries pengine: [10227]: notice: clone_print: >> Master/Slave Set: ms_drbd_nfs >> Mar 21 18:30:32 ries pengine: [10227]: notice: short_print: >> Masters: [ ries ] >> Mar 21 18:30:32 ries pengine: [10227]: notice: short_print: Slaves: >> [ laplace ] >> Mar 21 18:30:32 ries pengine: [10227]: notice: clone_print: Clone Set: >> cl_lsb_nfsserver >> Mar 21 18:30:32 ries pengine: [10227]: notice: short_print: >> Started: [ ries laplace ] >> Mar 21 18:30:32 ries pengine: [10227]: notice: group_print: Resource >> Group: g_nfs >> Mar 21 18:30:32 ries pengine: [10227]: notice: native_print: >> p_lvm_nfs#011(ocf::heartbeat:LVM):#011Started ries >> Mar 21 18:30:32 ries pengine: [10227]: notice: native_print: >> p_fs_afs#011(ocf::heartbeat:Filesystem):#011Started ries >> Mar 21 18:30:32 ries pengine: [10227]: notice: native_print: >> p_exportfs_afs#011(ocf::heartbeat:exportfs):#011Started ries >> Mar 21 18:30:32 ries pengine: [10227]: notice: native_print: >> ClusterIP#011(ocf::heartbeat:IPaddr2):#011Started ries >> Mar 21 18:30:32 ries pengine: [10227]: notice: clone_print: Clone Set: >> cl_mail >> Mar 21 18:30:32 ries pengine: [10227]: notice: short_print: >> Started: [ laplace ries ] >> Mar 21 18:30:32 ries pengine: [10227]: info: master_color: Promoting >> p_drbd_nfs:0 (Master ries) >> Mar 21 18:30:32 ries pengine: [10227]: info: master_color: ms_drbd_nfs: >> Promoted 1 instances of a possible 1 to master >> Mar 21 18:30:32 ries pengine: [10227]: info: master_color: Promoting >> p_drbd_nfs:0 (Master ries) >> Mar 21 18:30:32 ries pengine: [10227]: info: master_color: ms_drbd_nfs: >> Promoted 1 instances of a possible 1 to master >> Mar 21 18:30:32 ries pengine: [10227]: notice: RecurringOp: Start >> recurring monitor (15s) for p_drbd_nfs:0 on ries >> Mar 21 18:30:32 ries pengine: [10227]: ERROR: >> create_notification_boundaries: Creating boundaries for ms_drbd_nfs >> Mar 21 18:30:32 ries pengine: [10227]: ERROR: >> create_notification_boundaries: Creating boundaries for ms_drbd_nfs >> Mar 21 18:30:32 ries pengine: [10227]: notice: RecurringOp: Start >> recurring monitor (15s) for p_drbd_nfs:0 on ries >> Mar 21 18:30:32 ries pengine: [10227]: ERROR: >> create_notification_boundaries: Creating boundaries for ms_drbd_nfs >> Mar 21 18:30:32 ries pengine: [10227]: ERROR: >> create_notification_boundaries: Creating boundaries for ms_drbd_nfs >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_drbd_nfs:0#011(Master ries) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_drbd_nfs:1#011(Slave laplace) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_lsb_nfsserver:0#011(Started ries) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_lsb_nfsserver:1#011(Started laplace) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_lvm_nfs#011(Started ries) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_fs_afs#011(Started ries) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource p_exportfs_afs#011(Started ries) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource ClusterIP#011(Started ries) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource mail:0#011(Started laplace) >> Mar 21 18:30:32 ries pengine: [10227]: notice: LogActions: Leave >> resource mail:1#011(Started ries) >> Mar 21 18:30:32 ries crmd: [10228]: info: do_state_transition: State >> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS >> cause=C_IPC_MESSAGE origin=handle_response ] >> Mar 21 18:30:32 ries crmd: [10228]: info: unpack_graph: Unpacked >> transition 128: 1 actions in 1 synapses >> Mar 21 18:30:32 ries crmd: [10228]: info: do_te_invoke: Processing graph >> 128 (ref=pe_calc-dc-1332351032-363) derived from >> /var/lib/pengine/pe-input-29304.bz2 >> Mar 21 18:30:32 ries crmd: [10228]: info: te_rsc_command: Initiating >> action 18: monitor p_drbd_nfs:0_monitor_15000 on ries (local) >> Mar 21 18:30:32 ries crmd: [10228]: info: do_lrm_rsc_op: Performing >> key=18:128:8:991aee69-73be-4c2d-a1b8-c2d9c15fa83e >> op=p_drbd_nfs:0_monitor_15000 ) >> Mar 21 18:30:32 ries lrmd: [10225]: info: cancel_op: operation >> monitor[173] on ocf::drbd::p_drbd_nfs:0 for client 10228, its >> parameters: CRM_meta_clone=[0] drbd_resource=[data] >> CRM_meta_master_node_max=[1] CRM_meta_clone_node_max=[1] >> CRM_meta_clone_max=[2] CRM_meta_notify=[true] CRM_meta_master_max=[1] >> CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] >> CRM_meta_operation=[monitor] CRM_meta_name=[monitor] >> CRM_meta_role=[Master] CRM_meta_interval=[15000] >> CRM_meta_timeout=[120000] cancelled >> Mar 21 18:30:32 ries lrmd: [10225]: info: rsc:p_drbd_nfs:0:174: monitor >> Mar 21 18:30:32 ries crmd: [10228]: info: process_lrm_event: LRM >> operation p_drbd_nfs:0_monitor_15000 (call=173, status=1, cib-update=0, >> confirmed=true) Cancelled >> Mar 21 18:30:32 ries pengine: [10227]: info: process_pe_message: >> Transition 128: PEngine Input stored in: /var/lib/pengine/pe-input-29304.bz2 >> Mar 21 18:30:32 ries crmd: [10228]: info: process_lrm_event: LRM >> operation p_drbd_nfs:0_monitor_15000 (call=174, rc=8, cib-update=436, >> confirmed=false) master >> Mar 21 18:30:32 ries crmd: [10228]: info: match_graph_event: Action >> p_drbd_nfs:0_monitor_15000 (18) confirmed on ries (rc=0) >> Mar 21 18:30:32 ries crmd: [10228]: info: run_graph: >> ==================================================== >> Mar 21 18:30:32 ries crmd: [10228]: notice: run_graph: Transition 128 >> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pengine/pe-input-29304.bz2): Complete >> Mar 21 18:30:32 ries crmd: [10228]: info: te_graph_trigger: Transition >> 128 is now complete >> Mar 21 18:30:32 ries crmd: [10228]: info: notify_crmd: Transition 128 >> status: done - <null> >> Mar 21 18:30:32 ries crmd: [10228]: info: do_state_transition: State >> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >> cause=C_FSA_INTERNAL origin=notify_crmd ] >> Mar 21 18:30:32 ries crmd: [10228]: info: do_state_transition: Starting >> PEngine Recheck Timer >> >> >> Thanks >> Christoph >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > > > -- > Dan Frincu > CCNA, RHCE > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
