CVSROOT:        /cvs/cluster
Module name:    cluster
Branch:         RHEL4
Changes by:     [EMAIL PROTECTED]       2007-11-07 15:22:31

Modified files:
        dlm-kernel/src : lockqueue.c 

Log message:
        bz 349001
        
        For the entire life of the dlm, there's been an annoying issue that 
we've
        worked around and not "fixed" directly.  It's the source of all these
        messages:
        
        process_lockqueue_reply id 2c0224 state 0
        
        The problem that a lock master sends an async "granted" message for a
        convert request *before* actually sending the reply for the original
        convert.  The work-around is that the requesting node just takes the
        granted message as an implicit reply to the conversion and ignores the
        convert reply when it arrives later (the message above is printed when
        it gets the out-of-order reply for its convert).  Apart from the 
annoying
        messages, it's never been a problem.
        
        Now we've found a case where it's a real problem:
        
        1. nodeA: send convert PR->CW to nodeB
        nodeB: send granted message to nodeA
        nodeB: send convert reply to nodeA
        2. nodeA: receive granted message for conversion
        complete request, sending ast to gfs
        3. nodeA: send convert CW->EX to nodeB
        4. nodeA: receive reply for convert in step 1, which we ordinarily
        ignore, but since another convert has been sent, we mistake this
        message as the reply for the convert in step 3, and complete
        the convert request which is *not* really completed yet
        5. nodeA: send unlock to nodeB
        nodeB: complains about an unlock during a conversion
        
        The fix is to have nodeB not send a convert reply if it has already 
sent a
        granted message.  (We already do this for cases where the conversion is
        granted when first processing it, but we don't in cases where the grant
        is done after processing the convert.)

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/cluster/dlm-kernel/src/lockqueue.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.37.2.9&r2=1.37.2.10

--- cluster/dlm-kernel/src/Attic/lockqueue.c    2006/01/24 14:38:19     1.37.2.9
+++ cluster/dlm-kernel/src/Attic/lockqueue.c    2007/11/07 15:22:31     
1.37.2.10
@@ -590,6 +590,14 @@
        req->rr_lvbseq = lkb->lkb_lvbseq;
        add_request_lvb(lkb, req);
 
+       /* prevent a convert reply that hasn't been sent yet, the grant message
+          will serve as an implicit convert reply */
+       if (lkb->lkb_request) {
+               log_debug(lkb->lkb_resource->res_ls, "skip convert reply %x "
+                         "gr %d\n", lkb->lkb_id, lkb->lkb_grmode);
+               lkb->lkb_request = NULL;
+       }
+
        midcomms_send_buffer(&req->rr_header, e);
 }
 

Reply via email to