Re: [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash

Srinivas Eeda Mon, 24 Feb 2014 15:29:42 -0800

Junxiao, thanks for looking into this issue. Please see my comment below


On 02/24/2014 01:07 AM, Junxiao Bi wrote:

Hi,

On 07/19/2012 09:59 AM, Sunil Mushran wrote:

Different issues.

On Wed, Jul 18, 2012 at 6:34 PM, Junxiao Bi <junxiao...@oracle.com<mailto:junxiao...@oracle.com>> wrote:


    On 07/19/2012 12:36 AM, Sunil Mushran wrote:

    This bug was detected during code audit. Never seen a crash. If
    it does hit,
    then we have bigger problems. So no point posting to stable.

I read a lot of dlm recovery code recently, I found this bug couldhappen at the following scenario.


node 1:                                             migrate target node x:
dlm_unregister_domain()
 dlm_migrate_all_locks()
  dlm_empty_lockres()
   select node x as migrate target node
   since there is a node x lock on the granted list.
   dlm_migrate_lockres()
    dlm_mark_lockres_migrating() {
     wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
<<< node x unlock may happen here, res->granted list can be empty.

If the unlock request got sent at this point, and if the request was*processed*, lock must have been removed from the granted_list. If therequest was *not yet processed*, then the DLM_LOCK_RES_MIGRATING set indlm_lockres_release_ast would make dlm_unlock handler to returnDLM_MIGRATING to the caller (in this case node x). So I don't see howgranted_list could have stale lock. Am I missing something ?

I do think there is such race that you pointed below exist, but I am notsure if it was due to the above race described.

dlm_lockres_release_ast(dlm, res);
    }
    dlm_send_one_lockres()
dlm_process_recovery_data() {

tmpq isres->granted list and is empty.

list_for_each_entry(lock, tmpq, list) {

if(lock->ml.cookie != ml->cookie)

                                                      lock = NULL;
                                                     else
                                                      break;
                                                    }

lock will beinvalid here.if (lock->ml.node!= ml->node)BUG() -->crash here.

                                                   }

Thanks,
Junxiao.


    Our customer can reproduce it. Also I saw you were assigned a
    similar bug before, see
    https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220, is it the
    same BUG?


    On Tue, Jul 17, 2012 at 6:36 PM, Junxiao Bi
    <junxiao...@oracle.com <mailto:junxiao...@oracle.com>> wrote:

        Hi Sunil,

        On 07/18/2012 03:49 AM, Sunil Mushran wrote:

        On Tue, Jul 17, 2012 at 12:10 AM, Junxiao Bi
        <junxiao...@oracle.com <mailto:junxiao...@oracle.com>> wrote:

            In the target node of the dlm lock migration, the logic
            to find
            the local dlm lock is wrong, it shouldn't change the
            loop variable
            "lock" in the list_for_each_entry loop. This will cause
            a NULL-pointer
            accessing crash.

            Signed-off-by: Junxiao Bi <junxiao...@oracle.com
            <mailto:junxiao...@oracle.com>>
            Cc: sta...@vger.kernel.org <mailto:sta...@vger.kernel.org>
            ---
             fs/ocfs2/dlm/dlmrecovery.c |   12 +++++++-----
             1 file changed, 7 insertions(+), 5 deletions(-)

            diff --git a/fs/ocfs2/dlm/dlmrecovery.c
            b/fs/ocfs2/dlm/dlmrecovery.c
            index 01ebfd0..0b9cc88 100644
            --- a/fs/ocfs2/dlm/dlmrecovery.c
            +++ b/fs/ocfs2/dlm/dlmrecovery.c
            @@ -1762,6 +1762,7 @@ static int
            dlm_process_recovery_data(struct dlm_ctxt *dlm,
                    u8 from = O2NM_MAX_NODES;
                    unsigned int added = 0;
                    __be64 c;
            +       int found;

                    mlog(0, "running %d locks for this lockres\n",
            mres->num_locks);
                    for (i=0; i<mres->num_locks; i++) {
            @@ -1793,22 +1794,23 @@ static int
            dlm_process_recovery_data(struct dlm_ctxt *dlm,
                                    /* MIGRATION ONLY! */
            BUG_ON(!(mres->flags & DLM_MRES_MIGRATION));

            +                       found = 0;
            spin_lock(&res->spinlock);
                                    for (j = DLM_GRANTED_LIST; j <=
            DLM_BLOCKED_LIST; j++) {
                                            tmpq =
            dlm_list_idx_to_ptr(res, j);
            list_for_each_entry(lock, tmpq, list) {
            -                                       if
            (lock->ml.cookie != ml->cookie)
            -       lock = NULL;
            - else
            +                                       if
            (lock->ml.cookie == ml->cookie) {
            +       found = 1;
                  break;
            +                                       }
                                            }
            -                               if (lock)
            +                               if (found)
            break;
                                    }

                                    /* lock is always created
            locally first, and
                                     * destroyed locally last.  it
            must be on the list */
            -                       if (!lock) {
            +                       if (!found) {
                                            c = ml->cookie;
            mlog(ML_ERROR, "Could not find local lock "
                 "with cookie %u:%llu, node %u, "



        
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47


        We had decided to go back to list_for_each().


        OK, thank you. It's OK to revert it back for a introduced
        bug. But I think you'd better cc stable branch.




_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash

Reply via email to