Hi Hideo-san, On Wed, Oct 10, 2012 at 03:22:08PM +0900, [email protected] wrote: > Hi All, > > We found pacemaker that we could not judge a result of the operation of lrmd > well. > > When we carry out following crm, a parameter of the operation of start is > given back to crmd as a result of operation of monitor. > > (snip) > primitive prmDiskd ocf:pacemaker:Dummy \ > params name="diskcheck_status_internal" device="/dev/vda" > interval="30" \ > op start interval="0" timeout="60s" on-fail="restart" > prereq="fencing" \ > op monitor interval="30s" timeout="60s" on-fail="restart" \ > op stop interval="0s" timeout="60s" on-fail="block" > (snip) > > This is because lrmd gives back prereq parameter of start as a result of > monitor operation. > As a result, crmd judge mismatched with a parameter of the monitor operation > that crmd asked lrmd for for the parameter that Irmd carried out of the > monitor operation. > > We can confirm this problem by the next command in Pacemaker1.0.12. > > Command 1) crm_verify command outputs the difference in digest cord. > > [root@rh63-heartbeat1 ~]# crm_verify -L > crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: > Parameters to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded > 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce > (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 > > > Command 2) The ptest command outputs the difference in digest cord, too. > > [root@rh63-heartbeat1 ~]# ptest -L -VV > ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not > fencing unseen nodes > ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters > to prmDiskd:0_monitor_30000 on rh63-heartbeat1 changed: recorded > 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce > (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 > [root@rh63-heartbeat1 ~]# > > Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary > resource. > > Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: > check_action_definition: Parameters to prmDiskd:0_monitor_30000 on > rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. > d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) > 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 > Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start > recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1 > Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave > resource prmDiskd:0#011(Started rh63-heartbeat1) > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked > transition 2: 1 actions in 1 synapses > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing > graph 2 (ref=pe_calc-dc-1349868660-20) derived from > /var/lib/pengine/pe-input-2.bz2 > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: > Initiating action 1: monitor prmDiskd:0_monitor_30000 on rh63-heartbeat1 > (local) > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: > Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 > op=prmDiskd:0_monitor_30000 ) > Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation > monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] > CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] > CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] > CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] > prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] > CRM_meta_interval=[30000] CRM_meta_timeout=[60000] cancelled > Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 > monitor[5] (pid 20009) > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM > operation prmDiskd:0_monitor_30000 (call=4, status=1, cib-update=0, > confirmed=true) Cancelled > Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on > prmDiskd:0 for client 19839: pid 20009 exited with return code 0 > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: #### > yamauchi ####Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for > prmDiskd:0_monitor_30000 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). > Source: <parameters device="/dev/vda" name="diskcheck_status_internal" > interval="30" prereq="fencing" CRM_meta_timeout="60000"/> > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM > operation prmDiskd:0_monitor_30000 (call=5, rc=0, cib-update=53, > confirmed=false) ok > Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: > Action prmDiskd:0_monitor_30000 (1) confirmed on rh63-heartbeat1 (rc=0) > > > It is a problem to judge crmd that a digest cord is changed in not changing > the parameter at all. > > I made a patch. > The lrmd always gives back only a parameter depended on to a result from crmd > and is a patch copying a parameter necessary for only RA run time. > > My patch may have a problem. > Please confirm the contents of the patch.
What the patch does is to prevent lrmd from passing back the parameters defined with the operation. What's funny is that this code was there since 2006 (see LF bug 1301). Well, it makes sense to me. It would be good if Andrew takes a look too. And many thanks for the patch. Cheers, Dejan > Best Regards, > Hideo Yamauchi. > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
