Hi,

Please review the fix for 
http://bugs.opensolaris.org/view_bug.do?bug_id=6705938

Webrev at
http://cr.opensolaris.org/~tirth/webrev_6705938/
cmm fences off all nodes except the one node that has lost all interconnects

A brief description.
A split brain is being simulated and the partition with only one node is 
fencing of all the other nodes. The fix introduces a delay to slow down the 
smaller partition. 
In case of clusters with upto 4 nodes, each partition will be atleast n/2 where 
n is the number of nodes.

For bigger cluster, we let the smaller partition go ahead if they have 
sufficient number of nodes to tolerate further failures. We do it this way, 
because this speeds up the cmm reconfiguration and hence less service outage 
and the probability of a immediate second or third failure is less. Also, 
another assumption is that the administrators will soon realize the split brain 
and try to fix it and bring the other nodes online.

Please send all your reviews by 21st Aug 2008. 

Thanks,
Tirthankar 
http://blogs.sun.com/tirthankar
--

This message posted from opensolaris.org


Reply via email to