On Nov 13, 2007, at 5:17 PM, Anders Brownworth wrote:

Thanks for the quick response, Andrew.

'crm_resource -C -r OpenSer' seems to work but I do get an error about last-lrm-refresh not being able to be set:

Nov 13 14:00:12 box01 crm_resource: [11391]: ERROR: cib_native_perform_op: Call failed: The object/attribute does not exist Nov 13 14:00:12 box01 crm_resource: [11391]: ERROR: update_attr: Error setting last-lrm-refresh=1194962406 (section=crm_config, set=cib-bootstrap-options): The object/attribute does not exist

This shouldn't be important.
What version are you running?

The resource does, however, fail back when I do that AND set the fail-count to 0 on the primary and backup.

But the resource won't fail back unless fail-count is defined on the backup. The fail-count is initially undefined:

(box01:~) # crm_failcount -G -r OpenSer -U box02
name=fail-count-OpenSer value=(null)
Error performing operation: The object/attribute does not exist

Because the service failed to start previously on the primary, (box01) the fail-count is defined there. Once I define the fail- count on the backup (box02)

(box01:~) # crm_failcount -v 0 -r OpenSer -U box02
(box01:~) # crm_failcount -G -r OpenSer -U box02
name=fail-count-OpenSer value=0

it migrates back as expected.

Thats really weird (and looks like a bug).
Can you try with a later version?

Unless its not important what the update contains and just that there is one^... so the TE gets triggered and does the migration.

Thats what the "last-lrm-refresh" code above it supposed to be doing. That not working could cause this kind of behavior.

I suppose I should add a "set fail-count to 0" for both box01 and box02 in my startup scripts so merely doing a 'crm_resource -C -r OpenSer' migrates the service back after the initial failure.

Is there a better way to be doing this?

-Anders

Andrew Beekhof wrote:
prior to the latest interim build, starts were always fatal and required the use of crm_resource -C to make the node eligible again.

as of the last interim release, just make sure start-failure-is- fatal=false and use crm_failcount as you have below for "normal" failures.

Additionally, I followed the advice under "Resetting Failure Counts" in the V2 FAQ ( http://linux-ha.org/v2/faq ) where it suggests:

crm_failcount -D -U nodeA -r my_rsc

Rather than reset the failure count, this just torches it in such a way that you can't even read it with the query command given in the next step of the same example. I found statically setting the count back to 0 with:

crm_failcount -v 0 -U box01 -r OpenSer

worked much better and allowed me to push resources back and forth just by moving the fail count up and down.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to