Hi, Please review the fix for http://bugs.opensolaris.org/view_bug.do?bug_id=6705938
Webrev at http://cr.opensolaris.org/~tirth/webrev_6705938/ cmm fences off all nodes except the one node that has lost all interconnects A brief description. A split brain is being simulated and the partition with only one node is fencing of all the other nodes. The fix introduces a delay to slow down the smaller partition. In case of clusters with upto 4 nodes, each partition will be atleast n/2 where n is the number of nodes. For bigger cluster, we let the smaller partition go ahead if they have sufficient number of nodes to tolerate further failures. We do it this way, because this speeds up the cmm reconfiguration and hence less service outage and the probability of a immediate second or third failure is less. Also, another assumption is that the administrators will soon realize the split brain and try to fix it and bring the other nodes online. Please send all your reviews by 21st Aug 2008. Thanks, Tirthankar http://blogs.sun.com/tirthankar -- This message posted from opensolaris.org
