Two points to consider if we think on having such a preference rule : (1) We will need to think of a "distinction" (or "preferred node") rule.
We can say, we follow a "all resource groups or not" approach : if our node/partition does not host *any* resource groups (which means the other node/partition probably hosts all), then introduce a delay in our reconfiguration, to allow the other partition a chance to win. Or we can say, let the node/partition hosting lesser resource groups than the other node/partition introduce a delay in its CMM reconfiguration (race to acquire quorum). Another thing we would need to think over is this : with multiple rules for delay (size of partition, resources hosted by partition), how do we prioritize them as deciders for "delay"? So there are multiple deciding parameters to be considered. Admin of a cluster might also want to have a say in this decision of which partition delays itself - so things might need to be tunable. Theoretically, it seems we can have such a "preference" rule. (2) One important point to note is that the resource/resource group information is with RGM. We will obviously want to check if making CMM obtain information from RGM for its decision of "delay" in a reconfiguration is a good idea or not - knowing that RGM is dependent on CMM for membership information. Thanks & Regards, Sambit Hartmut Streppel wrote: > Isn't there an algorithm already in place that adapts the priority to > start the race for quorum > based on: > - node id (??) > - size of partition (i.e. higher number of nodes get priority). > If yes, this could be easily enhanced to include active/passive as > well - maybe this should be made configurable. > > Regards > Hartmut > > > On 07/13/09 10:32, Tirthankar wrote: >> >> >> On 07/13/09 13:54, Sergei Kolodka wrote: >>> Thanks Hartmut, >>> >>> It took me a couple of minutes to actually read about split brain >>> clusters and I've edited my initial post a bit :-) >>> >>> Anyway, how about define inactive node in active/passive cluster >>> instead as node without any resource groups running and introduce >>> some kind of check for resource groups before nodes starting >>> eliminating each other. I.e. if I'm not running anything I'm >>> inactive and can wait for second or two before deleting active node >>> key and actually deliberately let active node win in nodes shoot out. >> In a cluster all nodes are considered equal. Now say you have a 2 >> node cluster. node1 is running all the RGs and node2 isnt. There are >> a couple of scenarios >> >> 1. A real split brain happens. >> I.e. Both nodes are up but there is a network disconnect. In this >> case, giving priority to node1 makes sense and it commits suicide. >> >> 2. Node 1 panics >> To node 2, this still looks like a split brain as it can not contact >> node1. If the algo is modified to give priority to node1, you will >> have a full cluster outage. >> >> Hence in order to make the algo work in all scenarios, we do not give >> priority based on what groups are being hosted on which nodes. >> >> Though I guess the algo could be enhanced. >> >> >>> >>> I'm not sure if timestamps are saved in quorum database but if they >>> are the simple check if primary node of split-brain cluster was >>> reachable from quorum server after standby node was not able to >>> connect to it can give standby node idea that primary node is not so >>> dead and might be able to continue work. In real life situation node >>> which under load has no chance to win in this race with node which >>> is just sitting and doing nothing. But I'm pretty sure I'm missing >>> something important here. >>> >>> Regards, >>> Sergei >> _______________________________________________ >> ha-clusters-discuss mailing list >> ha-clusters-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >