On Tue, Apr 22, 2008 at 4:01 AM, Junko IKEDA <[EMAIL PROTECTED]> wrote: > > > I have one master/slave resource. > > > (Heartbeat 2.2.0 + Pacemaker 0.6.2) > > > > > > Master/Slave Set: ms-sf > > > stateful-1:0 (ocf::heartbeat:Stateful):Master node-b > > > stateful-1:1 (ocf::heartbeat:Stateful):Started node-a > > > > > > If stateful-1:0 fails, crm_mon would show like this; > > > > > > Master/Slave Set: ms-sf > > > stateful-1:0 (ocf::heartbeat:Stateful):Stopped > > > stateful-1:1 (ocf::heartbeat:Stateful):Master node-a > > > > > > Failed actions: > > > stateful-1:0_demote_0 (node=node-b, call=7, rc=7): complete > > > > > > I tried to clear the failcount of stateful-1:0 with crm_failcount. > > > > That doesn't remove the failed operation though... only the counter > > which tracks how many times the resource failed. > > > > Perhaps try crm_resource -C > > ok, I tried this. > > (1) run the resource > > > Master/Slave Set: ms-sf > stateful-1:0 (ocf::heartbeat:Stateful):Master node-b > stateful-1:1 (ocf::heartbeat:Stateful):Started node-a > > > (2) break master resource > > # rm -f /var/run/heartbeat/rsctmp/Stateful-stateful-1\:0.state > > > Master/Slave Set: ms-sf > stateful-1:0 (ocf::heartbeat:Stateful):Stopped > stateful-1:1 (ocf::heartbeat:Stateful):Master node-a > > Failed actions: > stateful-1:0_demote_0 (node=node-b, call=7, rc=7): complete > > > (3) clear master resource > > # crm_resource -C -r stateful-1:0 -H node-b > > > Master/Slave Set: ms-sf > stateful-1:0 (ocf::heartbeat:Stateful):Stopped > stateful-1:1 (ocf::heartbeat:Stateful):Master node-a > > > (4) get back the failcount to "0" > > > # crm_failcount -r stateful-1:0 -U node-b -D > > > Master/Slave Set: ms-sf > stateful-1:0 (ocf::heartbeat:Stateful):Master node-b > stateful-1:1 (ocf::heartbeat:Stateful):Stopped > > > node-b could be master again, > but stateful-1:1 on node-a stopped instead of being slave(status Started). > > at this time, the failcount for stateful-1:1/node-a is counted. > > # cibadmin -Q | grep fail-count > <nvpair > id="status-c53511b5-7568-426e-bbd5-f258e24aa9ac-fail-count-stateful-1:1" > name="fail-count-stateful-1:1" value="1"/> > > Is it needed to be counted?
"sort of" Given what happened, it is correct that the failcount was incremented. The problem is that what happened was incorrect... the monitor-1s op was being executed _before_ the instance was being demoted (which is clearly wrong). Fixed in: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e105f4e7a3cf _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
