Von: [EMAIL PROTECTED] im Auftrag von Lars Marowsky-Bree Gesendet: Fr 15.08.2008 16:19
>On 2008-08-15T13:13:46, "Krauth, Alexander" <[EMAIL PROTECTED]> wrote: > >> Situation: >> Running a two node cluster. A master/slave resource with clone_max = 2, >> clone_node_max = 1, master_max = 1, master_node_max = 1. >> Startup is working, that means 1 clone is running as a master and the >> other clone is running as a slave on the second node. Now I kill the >> application processes of the master resource. The RA recognices this and >> returns OCF_FAILED_MASTER in the next monitor operation. >> >> My expectation: >> 1. Promote the existing slave on the second node to the new master. >> 2. Restart the old master on the first node as a slave. >> >> What happens is: >> 1. Restart the master as master on the first node. >> 2. Nothing happens to the still running slave. >> >> Is this the intended behavior ? Are there any attributes to change it >> (setting some -INFINITY for fail-counts for only _master_ resources) ? >> I could provide a hb_report on this, if needed. > > This is because of the crm_master preference. > > First, recovery of the slaves is tried - as your first node is still > eligible to run a slave, a slave is started. > > Then, the PE looks as the crm_master values, which indicate which side > is preferable. If they are equal, it amounts to a random pick, and you > end up with the first node. > > The alternative - promoting the second slave first, then restarting the > slave, and comparing preferences again - might cause the master to shift > once more than necessary. > > So the behaviour you describe is expected as of now (and "correct", even > if not optimal in your use case), but we welcome a thought out proposal > of how to configure & describe a better one ;-) > > > Regards, > Lars Thanks Lars for the explanation. That helped a lot. I solved it that way: Operation start: crm_master -v 100 That means for all clone instances, and someone gets Master. Operation monitor: If OCF_FAILED_MASTER: crm_master -v 10 That means a failed master will degree it's own ambition to be the Master. And a existing slave will be promoted. Operation notify_post_promote: crm_master -v 100 All back to default. In case of a node failure the surviving slave gets the Master anyway, so that is also covered. It tested this on my cluster (still 2.1.3) and it worked fine. I attach the new SAPInstance RA and a short description, if someone likes to test. Regards Alex
<<winmail.dat>>
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
