Commit:     222d396092acc11b4af03bede309aa066945e920
Parent:     a1bc86e6bddd34362ca08a3a4d898eb4b5c15215
Author:     David Teigland <[EMAIL PROTECTED]>
AuthorDate: Mon Jan 15 10:28:22 2007 -0600
Committer:  Steven Whitehouse <[EMAIL PROTECTED]>
CommitDate: Mon Feb 5 13:36:58 2007 -0500

    [DLM] fix master recovery
    If master recovery happens on an rsb in one recovery sequence, then that
    sequence is aborted before lock recovery happens, then in the next
    sequence, we rely on the previous master recovery (which may now be
    invalid due to another node ignoring a lookup result) and go on do to the
    lock recovery where we get stuck due to an invalid master value.
     recovery cycle begins: master of rsb X has left
     nodes A and B send node C an rcom lookup for X to find the new master
     C gets lookup from B first, sets B as new master, and sends reply back to B
     C gets lookup from A next, and sends reply back to A saying B is master
     A gets lookup reply from C and sets B as the new master in the rsb
     recovery cycle on A, B and C is aborted to start a new recovery
     B gets lookup reply from C and ignores it since there's a new recovery
     recovery cycle begins: some other node has joined
     B doesn't think it's the master of X so it doesn't rebuild it in the 
     C looks up the master of X, no one is master, so it becomes new master
     B looks up the master of X, finds it's C
     A believes that B is the master of X, so it sends its lock to B
     B sends an error back to A
     A resends
     this repeats forever, the incorrect master value on A is never corrected
    The fix is to do master recovery on an rsb that still has the NEW_MASTER
    flag set from an earlier recovery sequence, and therefore didn't complete
    lock recovery.
    Signed-off-by: David Teigland <[EMAIL PROTECTED]>
    Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]>
 fs/dlm/recover.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index a7fa4cb..c2cc769 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -397,7 +397,9 @@ int dlm_recover_masters(struct dlm_ls *ls)
                if (dlm_no_directory(ls))
                        count += recover_master_static(r);
-               else if (!is_master(r) && dlm_is_removed(ls, r->res_nodeid)) {
+               else if (!is_master(r) &&
+                        (dlm_is_removed(ls, r->res_nodeid) ||
+                         rsb_flag(r, RSB_NEW_MASTER))) {
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Reply via email to