Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
On 10-07-21 11:19, Sunil Mushran wrote: So I discussed this problem with Srini. Yes, your patch is on the right track. Just that it needs to be rebased atop Srini's patch. It seems that Srini's patch is not committed to mainline yet. regards, wengang. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
On 07/22/2010 03:51 AM, Wengang Wang wrote: On 10-07-21 11:19, Sunil Mushran wrote: So I discussed this problem with Srini. Yes, your patch is on the right track. Just that it needs to be rebased atop Srini's patch. It seems that Srini's patch is not committed to mainline yet. No, it is not as yet. But should be in the queue. Just get the patch from Srini and rebase your patch atop it. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
On 10-07-20 15:33, Sunil Mushran wrote: On 07/19/2010 07:59 PM, Wengang Wang wrote: Do you have the message sequencing that would lead to this situation? If we migrate the lockres to the reco master, the reco master will send an assert that will make us change the master. So first, the problem is not about the changing owner. It is that the bit(in refmap) on behalf of the node in question is not cleared on the new master(recovery master). So that the new master will fail at purging the lockres due to the incorrect bit in refmap. Second, I have no messages at hand for the situation. But I think it is simple enough. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. In dlm_do_local_recovery_cleanup(), we expicitly clear the flag... when the owner is the dead_node. So this should not happen. It reproduces in my test env. Clearing the flag DLM_LOCK_RES_DROPPING_REF doesn't prevent anything. dlm_do_local_recovery_cleanup() continue to move the lockres to recovery list. 2337 /* the wake_up for this will happen when the 2338 * RECOVERING flag is dropped later */ 2339 res-state = ~DLM_LOCK_RES_DROPPING_REF; 2340 2341 dlm_move_lockres_to_recovery_list(dlm, res); and dlm_purge_lockres() continue to unhash the lockres. 202 ret = dlm_drop_lockres_ref(dlm, res); 203 if (ret 0) { 204 mlog_errno(ret); 205 if (!dlm_is_host_down(ret)) 206 BUG(); 207 } 208 mlog(0, %s:%.*s: dlm_deref_lockres returned %d\n, 209 dlm-name, res-lockname.len, res-lockname.name, ret); 210 spin_lock(dlm-spinlock); 211 } 212 213 spin_lock(res-spinlock); 214 if (!list_empty(res-purge)) { 215 mlog(0, removing lockres %.*s:%p from purgelist, 216 master = %d\n, res-lockname.len, res-lockname.name, 217 res, master); 218 list_del_init(res-purge); 219 spin_unlock(res-spinlock); 220 dlm_lockres_put(res); 221 dlm-purge_count--; 222 } else 223 spin_unlock(res-spinlock); 224 225 __dlm_unhash_lockres(res); Your patch changes the logic to exclude such lockres' from the recovery list. And that's a change, while possibly workable, needs to be looked into more thoroughly. In short, there is a disconnect between your description and your patch. Or, my understanding. For mormal, we recover the lockres to recovery master, and then re-send the deref message to it. That my privious patches do. After discussing with Srini, we found ignoring the failure of deref to the original master and not recovering the lockres to recovery master has the same effect. And it's simpler. The patch fixes the bug per my test result. regards, wengang. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
On 07/19/2010 07:59 PM, Wengang Wang wrote: Do you have the message sequencing that would lead to this situation? If we migrate the lockres to the reco master, the reco master will send an assert that will make us change the master. So first, the problem is not about the changing owner. It is that the bit(in refmap) on behalf of the node in question is not cleared on the new master(recovery master). So that the new master will fail at purging the lockres due to the incorrect bit in refmap. Second, I have no messages at hand for the situation. But I think it is simple enough. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. In dlm_do_local_recovery_cleanup(), we expicitly clear the flag... when the owner is the dead_node. So this should not happen. Your patch changes the logic to exclude such lockres' from the recovery list. And that's a change, while possibly workable, needs to be looked into more thoroughly. In short, there is a disconnect between your description and your patch. Or, my understanding. So node A means to drop the ref on the master. But in such a situation, node C keeps the ref on behalf of node A unexpectedly. Node C finally fails at purging lockres A and hang on umount. I think your problem is the one race we have concerning reco/migration. If so, this fix is not enough. It's a problem of purging + recovery. no pure migration for umount here. So what's your concern? See above. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
Any comment? On 10-06-11 00:25, Wengang Wang wrote: If the dlm recovery goes on the non-master node where purging work is going on, There could be unexpected reference left on some lockres' on recovery master. That is because we migrated the lockres' to recovery master but didn't send deref requests to it accordingly(was sending to the dead original master or to the UNKNOWN). Fix: For the lockres which is in progress of dropping reference, we don't migrate it to recovery master and unhash the lockres in the purge work. For those not in progress of the dropping, delay the purge work until recovery finished so that it can send deref request to the correct master(recovery master) later. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmrecovery.c | 17 +++-- fs/ocfs2/dlm/dlmthread.c | 36 ++-- 2 files changed, 37 insertions(+), 16 deletions(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..43530ce 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; + assert_spin_locked(dlm-spinlock); + assert_spin_locked(res-spinlock); res-state |= DLM_LOCK_RES_RECOVERING; if (!list_empty(res-recovering)) { mlog(0, @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) /* the wake_up for this will happen when the * RECOVERING flag is dropped later */ - res-state = ~DLM_LOCK_RES_DROPPING_REF; + if (res-state DLM_LOCK_RES_DROPPING_REF) { + /* + * don't migrate a lockres which is in + * progress of dropping ref + */ + mlog(ML_NOTICE, %.*s ignored for + migration\n, res-lockname.len, + res-lockname.name); + res-state = + ~DLM_LOCK_RES_DROPPING_REF; + } else + dlm_move_lockres_to_recovery_list(dlm, + res); - dlm_move_lockres_to_recovery_list(dlm, res); } else if (res-owner == dlm-node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index d4f73ca..0771420 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res) * truly ready to be freed. */ int __dlm_lockres_unused(struct dlm_lock_resource *res) { - if (!__dlm_lockres_has_locks(res) - (list_empty(res-dirty) !(res-state DLM_LOCK_RES_DIRTY))) { - /* try not to scan the bitmap unless the first two - * conditions are already true */ - int bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0); - if (bit = O2NM_MAX_NODES) { - /* since the bit for dlm-node_num is not - * set, inflight_locks better be zero */ - BUG_ON(res-inflight_locks != 0); - return 1; - } + int bit; + + if (__dlm_lockres_has_locks(res)) + return 0; + + if (!list_empty(res-dirty) || res-state DLM_LOCK_RES_DIRTY) + return 0; + + if (res-state DLM_LOCK_RES_RECOVERING) + return 0; + + bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0); + if (bit = O2NM_MAX_NODES) { + /* since the bit for dlm-node_num is not + * set, inflight_locks better be zero */ + BUG_ON(res-inflight_locks != 0); + return 1; } return 0; } @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, int master; int ret = 0; + assert_spin_locked(dlm-spinlock); + spin_lock(res-spinlock); if (!__dlm_lockres_unused(res)) { mlog(0, %s:%.*s: tried to purge but not unused\n, @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, master = %d\n, res-lockname.len, res-lockname.name, res, master); list_del_init(res-purge); - spin_unlock(res-spinlock); + /* not the last ref */
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
Srini has a patch that handles a race concerning DLM_LOCK_RES_DROPPING_REF. You may want to look a this afresh with that patch. My problem with this patch is that the description is not clear. That is because we migrated the lockres' to recovery master but didn't send deref requests to it accordingly(was sending to the dead original master or to the UNKNOWN). Do you have the message sequencing that would lead to this situation? If we migrate the lockres to the reco master, the reco master will send an assert that will make us change the master. I think your problem is the one race we have concerning reco/migration. If so, this fix is not enough. Sunil On 07/19/2010 03:09 AM, Wengang Wang wrote: Any comment? On 10-06-11 00:25, Wengang Wang wrote: If the dlm recovery goes on the non-master node where purging work is going on, There could be unexpected reference left on some lockres' on recovery master. That is because we migrated the lockres' to recovery master but didn't send deref requests to it accordingly(was sending to the dead original master or to the UNKNOWN). Fix: For the lockres which is in progress of dropping reference, we don't migrate it to recovery master and unhash the lockres in the purge work. For those not in progress of the dropping, delay the purge work until recovery finished so that it can send deref request to the correct master(recovery master) later. Signed-off-by: Wengang Wangwen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmrecovery.c | 17 +++-- fs/ocfs2/dlm/dlmthread.c | 36 ++-- 2 files changed, 37 insertions(+), 16 deletions(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..43530ce 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; +assert_spin_locked(dlm-spinlock); +assert_spin_locked(res-spinlock); res-state |= DLM_LOCK_RES_RECOVERING; if (!list_empty(res-recovering)) { mlog(0, @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) /* the wake_up for this will happen when the * RECOVERING flag is dropped later */ -res-state= ~DLM_LOCK_RES_DROPPING_REF; +if (res-state DLM_LOCK_RES_DROPPING_REF) { +/* + * don't migrate a lockres which is in + * progress of dropping ref + */ +mlog(ML_NOTICE, %.*s ignored for + migration\n, res-lockname.len, + res-lockname.name); +res-state= +~DLM_LOCK_RES_DROPPING_REF; +} else +dlm_move_lockres_to_recovery_list(dlm, + res); -dlm_move_lockres_to_recovery_list(dlm, res); } else if (res-owner == dlm-node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index d4f73ca..0771420 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res) * truly ready to be freed. */ int __dlm_lockres_unused(struct dlm_lock_resource *res) { -if (!__dlm_lockres_has_locks(res) -(list_empty(res-dirty) !(res-state DLM_LOCK_RES_DIRTY))) { -/* try not to scan the bitmap unless the first two - * conditions are already true */ -int bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0); -if (bit= O2NM_MAX_NODES) { -/* since the bit for dlm-node_num is not - * set, inflight_locks better be zero */ -BUG_ON(res-inflight_locks != 0); -return 1; -} +int bit; + +if (__dlm_lockres_has_locks(res)) +return 0; + +if (!list_empty(res-dirty) || res-state DLM_LOCK_RES_DIRTY) +return 0; + +if (res-state DLM_LOCK_RES_RECOVERING) +return 0; + +bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0); +if (bit= O2NM_MAX_NODES) { +/* since the bit for dlm-node_num is not + * set, inflight_locks better be zero */ +BUG_ON(res-inflight_locks
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
On 10-07-19 16:52, Sunil Mushran wrote: Srini has a patch that handles a race concerning DLM_LOCK_RES_DROPPING_REF. You may want to look a this afresh with that patch. My problem with this patch is that the description is not clear. That is because we migrated the lockres' to recovery master but didn't send deref requests to it accordingly(was sending to the dead original master or to the UNKNOWN). Do you have the message sequencing that would lead to this situation? If we migrate the lockres to the reco master, the reco master will send an assert that will make us change the master. So first, the problem is not about the changing owner. It is that the bit(in refmap) on behalf of the node in question is not cleared on the new master(recovery master). So that the new master will fail at purging the lockres due to the incorrect bit in refmap. Second, I have no messages at hand for the situation. But I think it is simple enough. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. So node A means to drop the ref on the master. But in such a situation, node C keeps the ref on behalf of node A unexpectedly. Node C finally fails at purging lockres A and hang on umount. I think your problem is the one race we have concerning reco/migration. If so, this fix is not enough. It's a problem of purging + recovery. no pure migration for umount here. So what's your concern? regards, wengang. Sunil On 07/19/2010 03:09 AM, Wengang Wang wrote: Any comment? On 10-06-11 00:25, Wengang Wang wrote: If the dlm recovery goes on the non-master node where purging work is going on, There could be unexpected reference left on some lockres' on recovery master. That is because we migrated the lockres' to recovery master but didn't send deref requests to it accordingly(was sending to the dead original master or to the UNKNOWN). Fix: For the lockres which is in progress of dropping reference, we don't migrate it to recovery master and unhash the lockres in the purge work. For those not in progress of the dropping, delay the purge work until recovery finished so that it can send deref request to the correct master(recovery master) later. Signed-off-by: Wengang Wangwen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmrecovery.c | 17 +++-- fs/ocfs2/dlm/dlmthread.c | 36 ++-- 2 files changed, 37 insertions(+), 16 deletions(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..43530ce 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; + assert_spin_locked(dlm-spinlock); + assert_spin_locked(res-spinlock); res-state |= DLM_LOCK_RES_RECOVERING; if (!list_empty(res-recovering)) { mlog(0, @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) /* the wake_up for this will happen when the * RECOVERING flag is dropped later */ - res-state= ~DLM_LOCK_RES_DROPPING_REF; + if (res-state DLM_LOCK_RES_DROPPING_REF) { + /* +* don't migrate a lockres which is in +* progress of dropping ref +*/ + mlog(ML_NOTICE, %.*s ignored for +migration\n, res-lockname.len, +res-lockname.name); + res-state= + ~DLM_LOCK_RES_DROPPING_REF; + } else + dlm_move_lockres_to_recovery_list(dlm, + res); - dlm_move_lockres_to_recovery_list(dlm, res); } else if (res-owner == dlm-node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index d4f73ca..0771420 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -92,17 +92,23 @@ int
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
Any comment? On 10-06-25 09:55, Wengang Wang wrote: Hi, Any comment on this? regards, wengang. On 10-06-11 00:25, Wengang Wang wrote: If the dlm recovery goes on the non-master node where purging work is going on, There could be unexpected reference left on some lockres' on recovery master. That is because we migrated the lockres' to recovery master but didn't send deref requests to it accordingly(was sending to the dead original master or to the UNKNOWN). Fix: For the lockres which is in progress of dropping reference, we don't migrate it to recovery master and unhash the lockres in the purge work. For those not in progress of the dropping, delay the purge work until recovery finished so that it can send deref request to the correct master(recovery master) later. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmrecovery.c | 17 +++-- fs/ocfs2/dlm/dlmthread.c | 36 ++-- 2 files changed, 37 insertions(+), 16 deletions(-) diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..43530ce 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; + assert_spin_locked(dlm-spinlock); + assert_spin_locked(res-spinlock); res-state |= DLM_LOCK_RES_RECOVERING; if (!list_empty(res-recovering)) { mlog(0, @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) /* the wake_up for this will happen when the * RECOVERING flag is dropped later */ - res-state = ~DLM_LOCK_RES_DROPPING_REF; + if (res-state DLM_LOCK_RES_DROPPING_REF) { + /* +* don't migrate a lockres which is in +* progress of dropping ref +*/ + mlog(ML_NOTICE, %.*s ignored for +migration\n, res-lockname.len, +res-lockname.name); + res-state = + ~DLM_LOCK_RES_DROPPING_REF; + } else + dlm_move_lockres_to_recovery_list(dlm, + res); - dlm_move_lockres_to_recovery_list(dlm, res); } else if (res-owner == dlm-node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index d4f73ca..0771420 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res) * truly ready to be freed. */ int __dlm_lockres_unused(struct dlm_lock_resource *res) { - if (!__dlm_lockres_has_locks(res) - (list_empty(res-dirty) !(res-state DLM_LOCK_RES_DIRTY))) { - /* try not to scan the bitmap unless the first two -* conditions are already true */ - int bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0); - if (bit = O2NM_MAX_NODES) { - /* since the bit for dlm-node_num is not -* set, inflight_locks better be zero */ - BUG_ON(res-inflight_locks != 0); - return 1; - } + int bit; + + if (__dlm_lockres_has_locks(res)) + return 0; + + if (!list_empty(res-dirty) || res-state DLM_LOCK_RES_DIRTY) + return 0; + + if (res-state DLM_LOCK_RES_RECOVERING) + return 0; + + bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0); + if (bit = O2NM_MAX_NODES) { + /* since the bit for dlm-node_num is not +* set, inflight_locks better be zero */ + BUG_ON(res-inflight_locks != 0); + return 1; } return 0; } @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, int master; int ret = 0; + assert_spin_locked(dlm-spinlock); + spin_lock(res-spinlock); if (!__dlm_lockres_unused(res)) { mlog(0, %s:%.*s: tried to purge but not unused\n, @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, master = %d\n, res-lockname.len, res-lockname.name, res, master);