Sambit, Some minor changes in the user-visible text:
ccradm.cc - Change "...changes after split-brain" to "...changes after split brain" (no hyphen is needed when there is no noun being modified). updatable_copy_impl.cc - In the SCMSGS user action, change "split-brain" to "split brain", but add a hyphen to change "weak membership model" to "weak-membership model" (here "weak-membership" is modifying the noun "model"). path_manager.cc - In the PM_DBG messages immediately before the SCMSGS text, change "done after split-brain" to "done after split brain" (no hyphen). - In the SCMSGS user action, change "non-cluster mode" to "noncluster mode" (multiple instances). - In the error message itself, change "after split-brain" to "after split brain". quorum_impl.cc - In the SCMSGS explanation, change "...determine the required amount of quorum votes required for survival..." to "...determine the number of quorum votes that are required for survival...". - Change "CMM: We could not..." to "CMM: Could not..." (multiple instances). Otherwise, user-visible text looks good. Thanks. Lisa Shepherd Sun Cluster Technical Publications "We're the M in RTFM" On 12/15/08 01:39, Sambit Nayak wrote: > Hi everyone, > > Please review the changes in CMM and CCR that support > the new "weaker" form of membership being introduced > with Project Colorado. This "weaker" form of membership > allows multiple partitions to survive if a split brain > situation develops. > > You can access the webrev at : > http://cr.opensolaris.org/~samnayak/colorado-I-CMM-CCR/ > > > ********* > Here is a summary of the CMM and CCR changes. > Please refer to the Requirements Document for Project Colorado > to understand what these changes in CMM and CCR are trying to achieve. > > > SUMMARY OF CMM CHANGES > ---------------------- > 1. cmm_impl::read_membership_info_from_ccr() reads membership properties > from CCR and stores them into the CMM 'conf' structure > (structure declaration in cmm_config.h). > > The properties read from infrastructure table in CCR are > a. multiple_partitions : true/false which indicates whether > cluster allows multiple partitions to survive > b. ping_targets : comma separated list of IP addresses > that the cluster nodes ping to check their own health > during CMM reconfiguration. This ping is done only > if cluster allows multiple partitions to survive > (weaker form of membership) > > This 'read' is done when the cluster node boots up first, > and also when infrastructure table (that stores these properties) > is modified. > > > 2. CMM has methods register_cmm_for_infr_callback() and > unregister_cmm_for_infr_callback() to register/unregister with CCR > for infrastructure table update callbacks. > When infrastructure table changes, CCR delivers callbacks to CMM > and CMM reads in its required membership information. > > The callback object registered by CMM is of type infr_cb_impl_for_cmm > (introduced in this set of changes). > > > 3. cmm_config.h declares the 'conf' structure that holds > properties read in by CMM from CCR. > We add a boolean value called multiple_partitions and > a list of strings called ping_targets that stores the IP addresses. > > 4. During CMM reconfiguration, the CMM automaton does the ping check > if cluster is configured to allow multiple partitions to survive. > This functionality is implemented in automaton_impl::ping_health_check(). > It essentially does a door upcall to userland qd_userd daemon > in order to execute the ping. > If the door upcall cannot be performed or the ping fails, > then the node is panicked by the automaton considering that > the ping check failed. > > 5. For the weaker form of membership that allows multiple partitions > to survive, we alter the definition of required quorum votes. > For the usual strong form of membership, the definition was : > Q = (V/2) +1 where Q is the required quorum votes and V is the total votes. > For the new weaker form of membership, the definition is : > Q = 1, if V = 1 or 2; > and we do not allow weaker form of membership if V > 2. > If the cluster has more than 2 nodes, strong form of membership is used. > > 6. When a node tries to join another node, a path is first formed > before CMM knows that a remote node is reachable and starts reconfiguration. > When a node receives a request to form path, > path_manager::update_node_incarnation() does checks based on > incarnation numbers before allowing the path formation. > An additional check is introduced here to see if the local node > has any CCR changes done during split-brain that are still unresolved. > If so, then the node will reject the request of path formation. > Thus, a node that has unresolved post-split-brain CCR changes > will not allow other nodes to join it. > The CCR changes have to be resolved (nodes have to be marked winner/loser) > before the nodes can join. > > > SUMMARY OF CCR CHANGES > ---------------------- > > 1. Flag To Indicate CCR Change During Split-Brain > > Existence of a file "/etc/cluster/.split_brain_ccr_change" > on a cluster node serves as an indicator that CCR changes > were done during split-brain on that node. > > The file is created (if it doesn't already exist) upon > successful completion of a CCR transaction during split-brain. > > The file is defined as SPLIT_BRAIN_CHANGE_FILE: > > #define SPLIT_BRAIN_CHANGE_FILE > "/etc/cluster/.split_brain_ccr_change" > > It is created with 600 permission bits, owned by root. > > The split-brain change file will be created at the end of > updatable_copy_impl::commit_transaction(), if all the previous > steps of the commit were successful. > > 2. New convenience functions in os class > > os::file_create() to create files. > The creation mode wil be O_EXCL | O_CREAT. > > os::file_exists() will be used to query whether or not > there is CCR change during split-brain on this node. > > 3. How To Indicate A Table Was Changed During Split-Brain > > No special indication present in the changed table itself. > Let the generation number increase normally whenever the table > is changed. The presence of the SPLIT_BRAIN_CHANGE_FILE will > indicate that some change exists. > > 4. How To Select The Winning CCR Copy > > The administrator will run a CLI utility to mark the CCR > on a cluster node as the truth copy. > (The CLI is not part of the present set of changes, > but the CCR interfaces that it will use are present.) > > The CCR interface will delete the SPLIT_BRAIN_CHANGE_FILE. > The CCR tables themselves are not modified. > > 5. How To Select The Loser CCR Copy > > The administrator will run a CLI utility to mark the CCR > on a cluster node as the losing copy. > (The CLI is not part of the present set of changes, > but the CCR interfaces that it will use are present.) > > The CCR interface will delete the SPLIT_BRAIN_CHANGE_FILE > is deleted, and then every table on the losing node > is marked as invalid (generation number -1). Marking every > table with this generation number -1 will ensure that > when this node joins the winner node to form cluster, > it will update its CCR copy using the winner node's CCR. > The loser node cannot form a cluster of its own, as well. > > Note that this command to mark the loser has to be run > in non-cluster mode. The node must then be rebooted > into cluster mode. > > 6. CCR Interfaces Provided > > a. Interface to query if this node has split-brain CCR change(s). > Exists both in kernel and userland. > Implementation : Check if SPLIT_BRAIN_CHANGE_FILE exists. > > bool ccrlib::split_brain_ccr_change_exists() > > b. Interface to mark this node's CCR as winning copy. > Removes SPLIT_BRAIN_CHANGE_FILE. > Exists in userland. > > int ccrlib::mark_ccr_as_winner() > > c. Interface to mark this node's CCR as valid until cluster join; > in other words, mark this node's CCR as loser copy. > Removes SPLIT_BRAIN_CHANGE_FILE file. > Set the gennum of every CCR table to -1. > Exists in userland. > > int ccrlib::mark_ccr_as_loser() > > > 7. Internal utility - ccradm - is modified to provide the following > features until the CLI is available. > a. ccradm -q : check whether split-brain CCR change exist > b. ccradm -m <"winner"|"loser"> : > mark a node as the "winner" or "loser" > > > *********** > > > Apologies for the short notice; > we are targetting a putback to Colorado staging gate on Dec 17 > if there are no major objections, > and hence would request quick reviews. :) > > > > Thanks & Regards, > Zoram, Sambit > > > > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >