CVSROOT:        /cvs/cluster
Module name:    cluster
Branch:         RHEL4
Changes by:     [EMAIL PROTECTED]       2008-01-14 15:35:30

Modified files:
        gfs-kernel/src/dlm: mount.c 

Log message:
        bz 324881
        
        It's easy to tell if you've hit this bug, because a message like this 
will
        always appear in /var/log/messages:
        
        SM: 02000378 ignoring service callback id=2000144 event=1324
        
        If you look at /proc/cluster/lock_dlm/debug on this node at this point,
        you'll see something like this at the end, which shows what the problem
        is:
        
        others_may_mount start_done 1322 b
        
        The event_id that others_may_mount uses when calling kcl_start_done()
        is incorrect; it's using 1322 when it should be 1324.
        
        I believe the fix is for others_may_mount() to read the event_id
        after taking the umount_lock semaphore which serializes
        others_may_mount() with a start callback from the lock_dlm thread.
        In this case, I believe the start callback is changing the event_id
        after others_may_mount reads it, and before othres_may_mount gets
        the umount_lock semaphore.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/dlm/mount.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.11.2.3&r2=1.11.2.4

--- cluster/gfs-kernel/src/dlm/Attic/mount.c    2005/06/29 07:28:21     1.11.2.3
+++ cluster/gfs-kernel/src/dlm/Attic/mount.c    2008/01/14 15:35:30     1.11.2.4
@@ -316,11 +316,12 @@
                return;
        }
 
+       down(&dlm->unmount_lock);
+
        spin_lock(&dlm->async_lock);
        last_start = dlm->mg_last_start;
        spin_unlock(&dlm->async_lock);
 
-       down(&dlm->unmount_lock);
        set_bit(DFL_OTHERSMAYMOUNT, &dlm->flags);
 
        /* There's been a start to add a second node while we've been

Reply via email to