Thanks for the quick response, Andrew.
'crm_resource -C -r OpenSer' seems to work but I do get an error about
last-lrm-refresh not being able to be set:
Nov 13 14:00:12 box01 crm_resource: [11391]: ERROR:
cib_native_perform_op: Call failed: The object/attribute does not exist
Nov 13 14:00:12 box01 crm_resource: [11391]: ERROR: update_attr: Error
setting last-lrm-refresh=1194962406 (section=crm_config,
set=cib-bootstrap-options): The object/attribute does not exist
The resource does, however, fail back when I do that AND set the
fail-count to 0 on the primary and backup.
But the resource won't fail back unless fail-count is defined on the
backup. The fail-count is initially undefined:
(box01:~) # crm_failcount -G -r OpenSer -U box02
name=fail-count-OpenSer value=(null)
Error performing operation: The object/attribute does not exist
Because the service failed to start previously on the primary, (box01)
the fail-count is defined there. Once I define the fail-count on the
backup (box02)
(box01:~) # crm_failcount -v 0 -r OpenSer -U box02
(box01:~) # crm_failcount -G -r OpenSer -U box02
name=fail-count-OpenSer value=0
it migrates back as expected. I suppose I should add a "set fail-count
to 0" for both box01 and box02 in my startup scripts so merely doing a
'crm_resource -C -r OpenSer' migrates the service back after the initial
failure.
Is there a better way to be doing this?
-Anders
Andrew Beekhof wrote:
prior to the latest interim build, starts were always fatal and
required the use of crm_resource -C to make the node eligible again.
as of the last interim release, just make sure
start-failure-is-fatal=false and use crm_failcount as you have below
for "normal" failures.
Additionally, I followed the advice under "Resetting Failure Counts"
in the V2 FAQ ( http://linux-ha.org/v2/faq ) where it suggests:
crm_failcount -D -U nodeA -r my_rsc
Rather than reset the failure count, this just torches it in such a
way that you can't even read it with the query command given in the
next step of the same example. I found statically setting the count
back to 0 with:
crm_failcount -v 0 -U box01 -r OpenSer
worked much better and allowed me to push resources back and forth
just by moving the fail count up and down.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems