On 07/13/09 13:54, Sergei Kolodka wrote: > Thanks Hartmut, > > It took me a couple of minutes to actually read about split brain clusters > and I've edited my initial post a bit :-) > > Anyway, how about define inactive node in active/passive cluster instead as > node without any resource groups running and introduce some kind of check for > resource groups before nodes starting eliminating each other. I.e. if I'm not > running anything I'm inactive and can wait for second or two before deleting > active node key and actually deliberately let active node win in nodes shoot > out. In a cluster all nodes are considered equal. Now say you have a 2 node cluster. node1 is running all the RGs and node2 isnt. There are a couple of scenarios
1. A real split brain happens. I.e. Both nodes are up but there is a network disconnect. In this case, giving priority to node1 makes sense and it commits suicide. 2. Node 1 panics To node 2, this still looks like a split brain as it can not contact node1. If the algo is modified to give priority to node1, you will have a full cluster outage. Hence in order to make the algo work in all scenarios, we do not give priority based on what groups are being hosted on which nodes. Though I guess the algo could be enhanced. > > I'm not sure if timestamps are saved in quorum database but if they are the > simple check if primary node of split-brain cluster was reachable from quorum > server after standby node was not able to connect to it can give standby node > idea that primary node is not so dead and might be able to continue work. In > real life situation node which under load has no chance to win in this race > with node which is just sitting and doing nothing. But I'm pretty sure I'm > missing something important here. > > Regards, > Sergei
