Re: [Linux-HA] failcount for master/slave resource

Andrew Beekhof Thu, 24 Apr 2008 00:21:46 -0700

On Tue, Apr 22, 2008 at 4:01 AM, Junko IKEDA <[EMAIL PROTECTED]> wrote:
> > >  I have one master/slave resource.
>  > >  (Heartbeat 2.2.0 + Pacemaker 0.6.2)
>  > >
>  > >  Master/Slave Set: ms-sf
>  > >  stateful-1:0 (ocf::heartbeat:Stateful):Master node-b
>  > >  stateful-1:1 (ocf::heartbeat:Stateful):Started node-a
>  > >
>  > >  If stateful-1:0 fails, crm_mon would show like this;
>  > >
>  > >  Master/Slave Set: ms-sf
>  > >  stateful-1:0 (ocf::heartbeat:Stateful):Stopped
>  > >  stateful-1:1 (ocf::heartbeat:Stateful):Master node-a
>  > >
>  > >  Failed actions:
>  > >     stateful-1:0_demote_0 (node=node-b, call=7, rc=7): complete
>  > >
>  > >  I tried to clear the failcount of stateful-1:0 with crm_failcount.
>  >
>  > That doesn't remove the failed operation though... only the counter
>  > which tracks how many times the resource failed.
>  >
>  > Perhaps try crm_resource -C
>
>  ok, I tried this.
>
>  (1) run the resource
>
>
>  Master/Slave Set: ms-sf
>  stateful-1:0 (ocf::heartbeat:Stateful):Master node-b
>  stateful-1:1 (ocf::heartbeat:Stateful):Started node-a
>
>
>  (2) break master resource
>
>  # rm -f /var/run/heartbeat/rsctmp/Stateful-stateful-1\:0.state
>
>
>  Master/Slave Set: ms-sf
>  stateful-1:0 (ocf::heartbeat:Stateful):Stopped
>  stateful-1:1 (ocf::heartbeat:Stateful):Master node-a
>
>  Failed actions:
>  stateful-1:0_demote_0 (node=node-b, call=7, rc=7): complete
>
>
>  (3) clear master resource
>
>  # crm_resource -C -r stateful-1:0 -H node-b
>
>
>  Master/Slave Set: ms-sf
>  stateful-1:0 (ocf::heartbeat:Stateful):Stopped
>  stateful-1:1 (ocf::heartbeat:Stateful):Master node-a
>
>
>  (4) get back the failcount to "0"
>
>
>  # crm_failcount -r stateful-1:0 -U node-b -D
>
>
> Master/Slave Set: ms-sf
>  stateful-1:0 (ocf::heartbeat:Stateful):Master node-b
>  stateful-1:1 (ocf::heartbeat:Stateful):Stopped
>
>
>  node-b could be master again,
>  but stateful-1:1 on node-a stopped instead of being slave(status Started).
>
>  at this time, the failcount for stateful-1:1/node-a is counted.
>
>  # cibadmin -Q | grep fail-count
>  <nvpair
>  id="status-c53511b5-7568-426e-bbd5-f258e24aa9ac-fail-count-stateful-1:1"
>  name="fail-count-stateful-1:1" value="1"/>
>
>  Is it needed to be counted?


"sort of"

Given what happened, it is correct that the failcount was incremented.
The problem is that what happened was incorrect... the monitor-1s op
was being executed _before_ the instance was being demoted (which is
clearly wrong).

Fixed in: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e105f4e7a3cf
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] failcount for master/slave resource

Reply via email to