AW: [Linux-HA] Restart behavior of master/slave resources

Krauth, Alexander Wed, 20 Aug 2008 02:01:38 -0700

Von: [EMAIL PROTECTED] im Auftrag von Lars Marowsky-Bree
Gesendet: Fr 15.08.2008 16:19


>On 2008-08-15T13:13:46, "Krauth, Alexander" <[EMAIL PROTECTED]> wrote:
>
>> Situation:
>> Running a two node cluster. A master/slave resource with clone_max = 2,
>> clone_node_max = 1, master_max = 1,  master_node_max = 1.
>> Startup is working, that means 1 clone is running as a master and the
>> other clone is running as a slave on the second node. Now I kill the
>> application processes of the master resource. The RA recognices this and
>> returns OCF_FAILED_MASTER in the next monitor operation.
>> 
>> My expectation:
>> 1. Promote the existing slave on the second node to the new master.
>> 2. Restart the old master on the first node as a slave.
>> 
>> What happens is:
>> 1. Restart the master as master on the first node.
>> 2. Nothing happens to the still running slave.
>> 
>> Is this the intended behavior ? Are there any attributes to change it
>> (setting some -INFINITY for fail-counts for only _master_ resources) ?
>> I could provide a hb_report on this, if needed.
>
> This is because of the crm_master preference.
>
> First, recovery of the slaves is tried - as your first node is still
> eligible to run a slave, a slave is started.
>
> Then, the PE looks as the crm_master values, which indicate which side
> is preferable. If they are equal, it amounts to a random pick, and you
> end up with the first node.
>
> The alternative - promoting the second slave first, then restarting the
> slave, and comparing preferences again - might cause the master to shift
> once more than necessary.
>
> So the behaviour you describe is expected as of now (and "correct", even
> if not optimal in your use case), but we welcome a thought out proposal
> of how to configure & describe a better one ;-)
>
>
> Regards,
>     Lars

Thanks Lars for the explanation. That helped a lot.
 
I solved it that way:
 
Operation start: crm_master -v 100
That means for all clone instances, and someone gets Master.
 
Operation monitor: If OCF_FAILED_MASTER: crm_master -v 10
That means a failed master will degree it's own ambition to be the Master.
And a existing slave will be promoted.
 
Operation notify_post_promote: crm_master -v 100 
All back to default.
 
In case of a node failure the surviving slave gets the Master anyway,
so that is also covered.
It tested this on my cluster (still 2.1.3) and it worked fine. I attach the
new SAPInstance RA and a short description, if someone likes to test.
 
Regards
Alex

<<winmail.dat>>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

AW: [Linux-HA] Restart behavior of master/slave resources

Reply via email to