Notes from Colorado review Sept 18, 2008 These are my notes from the openhacluster review of Colorado this morning Sept 18, 2008. The notes are skimpy/sparse.
1. Split brain and recovery discussion Ashu and others want us to consider refinements/restrictions to minimize the consequences of split-brain. Want us to consider outlawing ccr updates, with an admin action to re-enable them, after a suspected split-brain. Wants us to try leverage intentional shutdown and panic shutdown (with its last gasp "I am panicing" message) to not be in that mode. Issue of software that updates ccr, other than admin tools, was mentioned. 2. Mode of do nothing on suspected split brain Nils (? i did not catch his name) advocated the Veritas approach where services do not move after a split brain or suspected split brain. Of course, this is in the context of two node without quorum. 3. Why explicit enable of software after package install, can we not compute in software that package is installed on all cluster hosts? 4. Split-brain recovery Observe that issue of recovery when split-brain heals, ie, when nodes attempt to rejoin with each other pertains to more than ccr, it also pertains to ZFS and to AVS SNDR. We need to explain our recovery approach and our recovery procedure (even if it is manual). Nils(?) mentioned that the difficult recovery re-emphasizes need for a mode of operation where services do not move after suspected split brain. 5. Performance Performance of membership reconfiguration is likely to be worse with weak quorum than with traditional quorum, ie, longer detection time. Performance of recovery is worse in that human has to do manual recovery in more cases. this is all my notes say, I realize more was discussed, Andy