On 07/13/09 13:54, Sergei Kolodka wrote:
> Thanks Hartmut,
>
> It took me a couple of minutes to actually read about split brain clusters 
> and I've edited my initial post a bit :-)
>
> Anyway, how about define inactive node in active/passive cluster instead as 
> node without any resource groups running and introduce some kind of check for 
> resource groups before nodes starting eliminating each other. I.e. if I'm not 
> running anything I'm inactive and can wait for second or two before deleting 
> active node key and actually deliberately let active node win in nodes shoot 
> out. 
In a cluster all nodes are considered equal. Now say you have a 2 node 
cluster. node1 is running all the RGs and node2 isnt. There are a couple 
of scenarios

1. A real split brain happens.
I.e. Both nodes are up but there is a network disconnect. In this case, 
giving priority to node1 makes sense and it commits suicide.

2. Node 1 panics
To node 2, this still looks like a split brain as it can not contact 
node1. If the algo is modified to give priority to node1, you will have 
a full cluster outage.

Hence in order to make the algo work in all scenarios, we do not give 
priority based on what groups are being hosted on which nodes.

Though I guess the algo could be enhanced.


>
> I'm not sure if timestamps are saved in quorum database but if they are the 
> simple check if primary node of split-brain cluster was reachable from quorum 
> server after standby node was not able to connect to it can give standby node 
> idea that primary node is not so dead and might be able to continue work. In 
> real life situation node which under load has no chance to win in this race 
> with node which is just sitting and doing nothing. But I'm pretty sure I'm 
> missing something important here.
>
> Regards,
> Sergei

Reply via email to