On Nov 13, 2007, at 5:17 PM, Anders Brownworth wrote:
Thanks for the quick response, Andrew.
'crm_resource -C -r OpenSer' seems to work but I do get an error
about last-lrm-refresh not being able to be set:
Nov 13 14:00:12 box01 crm_resource: [11391]: ERROR:
cib_native_perform_op: Call failed: The object/attribute does not
exist
Nov 13 14:00:12 box01 crm_resource: [11391]: ERROR: update_attr:
Error setting last-lrm-refresh=1194962406 (section=crm_config,
set=cib-bootstrap-options): The object/attribute does not exist
This shouldn't be important.
What version are you running?
The resource does, however, fail back when I do that AND set the
fail-count to 0 on the primary and backup.
But the resource won't fail back unless fail-count is defined on the
backup. The fail-count is initially undefined:
(box01:~) # crm_failcount -G -r OpenSer -U box02
name=fail-count-OpenSer value=(null)
Error performing operation: The object/attribute does not exist
Because the service failed to start previously on the primary,
(box01) the fail-count is defined there. Once I define the fail-
count on the backup (box02)
(box01:~) # crm_failcount -v 0 -r OpenSer -U box02
(box01:~) # crm_failcount -G -r OpenSer -U box02
name=fail-count-OpenSer value=0
it migrates back as expected.
Thats really weird (and looks like a bug).
Can you try with a later version?
Unless its not important what the update contains and just that there
is one^... so the TE gets triggered and does the migration.
Thats what the "last-lrm-refresh" code above it supposed to be doing.
That not working could cause this kind of behavior.
I suppose I should add a "set fail-count to 0" for both box01 and
box02 in my startup scripts so merely doing a 'crm_resource -C -r
OpenSer' migrates the service back after the initial failure.
Is there a better way to be doing this?
-Anders
Andrew Beekhof wrote:
prior to the latest interim build, starts were always fatal and
required the use of crm_resource -C to make the node eligible again.
as of the last interim release, just make sure start-failure-is-
fatal=false and use crm_failcount as you have below for "normal"
failures.
Additionally, I followed the advice under "Resetting Failure
Counts" in the V2 FAQ ( http://linux-ha.org/v2/faq ) where it
suggests:
crm_failcount -D -U nodeA -r my_rsc
Rather than reset the failure count, this just torches it in such
a way that you can't even read it with the query command given in
the next step of the same example. I found statically setting the
count back to 0 with:
crm_failcount -v 0 -U box01 -r OpenSer
worked much better and allowed me to push resources back and forth
just by moving the fail count up and down.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems