Junxiao, thanks for looking into this issue. Please see my comment below
On 02/24/2014 01:07 AM, Junxiao Bi wrote:
Hi,
On 07/19/2012 09:59 AM, Sunil Mushran wrote:
Different issues.
On Wed, Jul 18, 2012 at 6:34 PM, Junxiao Bi <junxiao...@oracle.com
<mailto:junxiao...@oracle.com>> wrote:
On 07/19/2012 12:36 AM, Sunil Mushran wrote:
This bug was detected during code audit. Never seen a crash. If
it does hit,
then we have bigger problems. So no point posting to stable.
I read a lot of dlm recovery code recently, I found this bug could
happen at the following scenario.
node 1: migrate target node x:
dlm_unregister_domain()
dlm_migrate_all_locks()
dlm_empty_lockres()
select node x as migrate target node
since there is a node x lock on the granted list.
dlm_migrate_lockres()
dlm_mark_lockres_migrating() {
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
<<< node x unlock may happen here, res->granted list can be empty.
If the unlock request got sent at this point, and if the request was
*processed*, lock must have been removed from the granted_list. If the
request was *not yet processed*, then the DLM_LOCK_RES_MIGRATING set in
dlm_lockres_release_ast would make dlm_unlock handler to return
DLM_MIGRATING to the caller (in this case node x). So I don't see how
granted_list could have stale lock. Am I missing something ?
I do think there is such race that you pointed below exist, but I am not
sure if it was due to the above race described.
dlm_lockres_release_ast(dlm, res);
}
dlm_send_one_lockres()
dlm_process_recovery_data() {
tmpq is
res->granted list and is empty.
list_for_each_entry(lock, tmpq, list) {
if
(lock->ml.cookie != ml->cookie)
lock = NULL;
else
break;
}
lock will be
invalid here.
if (lock->ml.node
!= ml->node)
BUG() -->
crash here.
}
Thanks,
Junxiao.
Our customer can reproduce it. Also I saw you were assigned a
similar bug before, see
https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220, is it the
same BUG?
On Tue, Jul 17, 2012 at 6:36 PM, Junxiao Bi
<junxiao...@oracle.com <mailto:junxiao...@oracle.com>> wrote:
Hi Sunil,
On 07/18/2012 03:49 AM, Sunil Mushran wrote:
On Tue, Jul 17, 2012 at 12:10 AM, Junxiao Bi
<junxiao...@oracle.com <mailto:junxiao...@oracle.com>> wrote:
In the target node of the dlm lock migration, the logic
to find
the local dlm lock is wrong, it shouldn't change the
loop variable
"lock" in the list_for_each_entry loop. This will cause
a NULL-pointer
accessing crash.
Signed-off-by: Junxiao Bi <junxiao...@oracle.com
<mailto:junxiao...@oracle.com>>
Cc: sta...@vger.kernel.org <mailto:sta...@vger.kernel.org>
---
fs/ocfs2/dlm/dlmrecovery.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmrecovery.c
b/fs/ocfs2/dlm/dlmrecovery.c
index 01ebfd0..0b9cc88 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -1762,6 +1762,7 @@ static int
dlm_process_recovery_data(struct dlm_ctxt *dlm,
u8 from = O2NM_MAX_NODES;
unsigned int added = 0;
__be64 c;
+ int found;
mlog(0, "running %d locks for this lockres\n",
mres->num_locks);
for (i=0; i<mres->num_locks; i++) {
@@ -1793,22 +1794,23 @@ static int
dlm_process_recovery_data(struct dlm_ctxt *dlm,
/* MIGRATION ONLY! */
BUG_ON(!(mres->flags & DLM_MRES_MIGRATION));
+ found = 0;
spin_lock(&res->spinlock);
for (j = DLM_GRANTED_LIST; j <=
DLM_BLOCKED_LIST; j++) {
tmpq =
dlm_list_idx_to_ptr(res, j);
list_for_each_entry(lock, tmpq, list) {
- if
(lock->ml.cookie != ml->cookie)
- lock = NULL;
- else
+ if
(lock->ml.cookie == ml->cookie) {
+ found = 1;
break;
+ }
}
- if (lock)
+ if (found)
break;
}
/* lock is always created
locally first, and
* destroyed locally last. it
must be on the list */
- if (!lock) {
+ if (!found) {
c = ml->cookie;
mlog(ML_ERROR, "Could not find local lock "
"with cookie %u:%llu, node %u, "
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47
We had decided to go back to list_for_each().
OK, thank you. It's OK to revert it back for a introduced
bug. But I think you'd better cc stable branch.
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel