Notes from Colorado review
Sept 18, 2008

These are my notes from the openhacluster review of
Colorado this morning Sept 18, 2008.  The notes
are skimpy/sparse.

1.  Split brain and recovery discussion

Ashu and others want us to consider refinements/restrictions
to minimize the consequences of split-brain.  Want us to
consider outlawing ccr updates, with an admin action to
re-enable them, after a suspected split-brain.  Wants us
to try leverage intentional shutdown and panic shutdown
(with its last gasp "I am panicing" message) to not be
in that mode.
Issue of software that updates ccr, other than admin tools,
was mentioned.

2.  Mode of do nothing on suspected split brain

Nils (? i did not catch his name) advocated the Veritas
approach where services do not move after a split brain
or suspected split brain.  Of course, this is in the
context of two node without quorum.

3.  Why explicit enable of software after package install,
can we not compute in software that package is installed
on all cluster hosts?

4.  Split-brain recovery

Observe that issue of recovery when split-brain heals, ie,
when nodes attempt to rejoin with each other pertains to
more than ccr, it also pertains to ZFS and to AVS SNDR.
We need to explain our recovery approach and our recovery
procedure (even if it is manual).  Nils(?) mentioned that
the difficult recovery re-emphasizes need for a mode of
operation where services do not move after suspected split
brain.

5.  Performance

Performance of membership reconfiguration is likely to be
worse with weak quorum than with traditional quorum, ie,
longer detection time.

Performance of recovery is worse in that human has to do
manual recovery in more cases.

this is all my notes say, I realize more was discussed,
Andy


Reply via email to