In dlm_assert_master_handler, spinlock is released little sooner which creates a window for two nodes to race during mastery.
In a scenario where node A had a head start during lock mastery and dlm spinlock is just released on node B in dlm_assert_master_handler. Right then a process on node B started to master the resource. It finds the mle but doesn't find lockres. Since mle *master* is not set it creates a lockres and waits for the mastery and doesn't send a master request. dlm_assert_master_handler doesn't know about this and doesn't set have_lockres_ref(doesn't set DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a hole that results in loss of refmap bit on the master node. Signed-off-by: Srinivas Eeda <srinivas.e...@oracle.com> --- fs/ocfs2/dlm/dlmmaster.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 83bcaf2..6e694b6 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -1875,7 +1875,6 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data, ok: spin_unlock(&res->spinlock); } - spin_unlock(&dlm->spinlock); // mlog(0, "woo! got an assert_master from node %u!\n", // assert->node_idx); @@ -1926,7 +1925,6 @@ ok: /* master is known, detach if not already detached. * ensures that only one assert_master call will happen * on this mle. */ - spin_lock(&dlm->spinlock); spin_lock(&dlm->master_lock); rr = atomic_read(&mle->mle_refs.refcount); @@ -1959,7 +1957,6 @@ ok: __dlm_put_mle(mle); } spin_unlock(&dlm->master_lock); - spin_unlock(&dlm->spinlock); } else if (res) { if (res->owner != assert->node_idx) { mlog(0, "assert_master from %u, but current " @@ -1967,6 +1964,7 @@ ok: res->owner, namelen, name); } } + spin_unlock(&dlm->spinlock); done: ret = 0; -- 1.5.6.5 _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel