Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
On 10-06-03 18:06, Srinivas Eeda wrote: Comments inline On 6/3/2010 9:37 AM, Wengang Wang wrote: Changes to V1: 1 move the msleep to the second runs when the lockres is in recovery so the purging work on other lockres' can go. 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't resend deref in this case. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmcommon.h |1 + fs/ocfs2/dlm/dlmrecovery.c | 25 +++ fs/ocfs2/dlm/dlmthread.c | 73 ++- 3 files changed, 90 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h index 4b6ae2c..4194087 100644 --- a/fs/ocfs2/dlm/dlmcommon.h +++ b/fs/ocfs2/dlm/dlmcommon.h @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt *dlm, #define DLM_LOCK_RES_IN_PROGRESS 0x0010 #define DLM_LOCK_RES_MIGRATING0x0020 #define DLM_LOCK_RES_DROPPING_REF 0x0040 +#define DLM_LOCK_RES_DE_DROP_REF 0x0080 Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) Discussed with Srini privately, both agreed we don't purge a lockres when it's in recovery state avoiding adding the new flag. I will send the revised patch later. regards, wengang. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
On 6/3/2010 10:37 PM, Wengang Wang wrote: Srini, On 10-06-03 19:17, Srinivas Eeda wrote: Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) If the idea of the fix is to address the race between purging and recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem. dlm_move_lockres_to_recovery_list moves lockres to resources list (which tracks of resources that needs recovery) and sets the flag DLM_LOCK_RES_RECOVERING. If we do not call dlm_move_lockres_to_recovery_list for the resource which have DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that case DLM_LOCK_RES_RECOVERING will not get set and the recovery master wouldn't know about this and the lockres that is in the middle of purging will get purged. For the lockres that got moved to resource list they will get migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set. So dlm_purge_list should consider this as being used and should defer purging. the lockres will get recovered and the new owner will be set and the flag DLM_LOCK_RES_RECOVERING. will get removed. dlm_purge_list can now go ahead and purge this lockres. I am following your idea. Addtion to your idea is that we also notice that we shouldn't send the DEREF request to the recovery master if we don't migrate the lockres to the recovery master(otherwise, another BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose. When we ignore migrating a lockres, we set this state. The case we don't migrate the lockres is only when it's dropping the reference right(when DLM_LOCK_RES_DROPPING_REF is set). In that case we just unhash and free the lockres. How do you determine whether it's in that case in code? I determine that by checking the DLM_LOCK_RES_DE_DROP_REF state. If DLM_LOCK_RES_DROPPING_REF is not set for the lockres, then it can get migrated(even if it's on the purge list). regards, wengang. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
Comments inline On 6/3/2010 9:37 AM, Wengang Wang wrote: Changes to V1: 1 move the msleep to the second runs when the lockres is in recovery so the purging work on other lockres' can go. 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't resend deref in this case. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmcommon.h |1 + fs/ocfs2/dlm/dlmrecovery.c | 25 +++ fs/ocfs2/dlm/dlmthread.c | 73 ++- 3 files changed, 90 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h index 4b6ae2c..4194087 100644 --- a/fs/ocfs2/dlm/dlmcommon.h +++ b/fs/ocfs2/dlm/dlmcommon.h @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt *dlm, #define DLM_LOCK_RES_IN_PROGRESS 0x0010 #define DLM_LOCK_RES_MIGRATING0x0020 #define DLM_LOCK_RES_DROPPING_REF 0x0040 +#define DLM_LOCK_RES_DE_DROP_REF 0x0080 Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) If the idea of the fix is to address the race between purging and recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem. dlm_move_lockres_to_recovery_list moves lockres to resources list (which tracks of resources that needs recovery) and sets the flag DLM_LOCK_RES_RECOVERING. If we do not call dlm_move_lockres_to_recovery_list for the resource which have DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that case DLM_LOCK_RES_RECOVERING will not get set and the recovery master wouldn't know about this and the lockres that is in the middle of purging will get purged. For the lockres that got moved to resource list they will get migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set. So dlm_purge_list should consider this as being used and should defer purging. the lockres will get recovered and the new owner will be set and the flag DLM_LOCK_RES_RECOVERING. will get removed. dlm_purge_list can now go ahead and purge this lockres. #define DLM_LOCK_RES_BLOCK_DIRTY 0x1000 #define DLM_LOCK_RES_SETREF_INPROG0x2000 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..7241070 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct dlm_work_item *item, void *data) /* any errors returned will be due to the new_master dying, * the dlm_reco_thread should detect this */ list_for_each_entry(res, resources, recovering) { + int ignore_mig = 0; + spin_lock(res-spinlock); + /* + * if we are dropping the lockres, no need to let the new master + * know the reference of this node. That is don't migrate the + * lockres to the new master. Also make sure we don't send DEREF + * request for the same lockres to the new master either. + */ + if (unlikely(res-state DLM_LOCK_RES_DROPPING_REF)) { + ignore_mig = 1; + res-state |= DLM_LOCK_RES_DE_DROP_REF; + } + spin_unlock(res-spinlock); + if (ignore_mig) { + mlog(ML_NOTICE, ignore migrating %.*s to recovery + master %u as we are dropping it\n, + res-lockname.len, res-lockname.name, + reco_master); + continue; + } + ret = dlm_send_one_lockres(dlm, res, mres, reco_master, DLM_MRES_RECOVERY); if (ret 0) { @@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; + assert_spin_locked(res-spinlock); + res-state |= DLM_LOCK_RES_RECOVERING; + res-state = ~DLM_LOCK_RES_DE_DROP_REF; + if (!list_empty(res-recovering)) { mlog(0, Recovering res %s:%.*s, is already on recovery list!\n, diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index d4f73ca..c4aa2ec 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -157,6 +157,9 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, { int master; int ret = 0; + int remote_drop = 1; + + assert_spin_locked(dlm-spinlock); spin_lock(res-spinlock); if (!__dlm_lockres_unused(res)) { @@ -184,12 +187,19 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, if (!master) res-state |= DLM_LOCK_RES_DROPPING_REF; + + /* + * If we didn't migrate this lockres to recovery master, don't send + *
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
Srini, On 10-06-03 18:06, Srinivas Eeda wrote: Comments inline On 6/3/2010 9:37 AM, Wengang Wang wrote: Changes to V1: 1 move the msleep to the second runs when the lockres is in recovery so the purging work on other lockres' can go. 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't resend deref in this case. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmcommon.h |1 + fs/ocfs2/dlm/dlmrecovery.c | 25 +++ fs/ocfs2/dlm/dlmthread.c | 73 ++- 3 files changed, 90 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h index 4b6ae2c..4194087 100644 --- a/fs/ocfs2/dlm/dlmcommon.h +++ b/fs/ocfs2/dlm/dlmcommon.h @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt *dlm, #define DLM_LOCK_RES_IN_PROGRESS 0x0010 #define DLM_LOCK_RES_MIGRATING0x0020 #define DLM_LOCK_RES_DROPPING_REF 0x0040 +#define DLM_LOCK_RES_DE_DROP_REF 0x0080 Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) If the idea of the fix is to address the race between purging and recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem. dlm_move_lockres_to_recovery_list moves lockres to resources list (which tracks of resources that needs recovery) and sets the flag DLM_LOCK_RES_RECOVERING. If we do not call dlm_move_lockres_to_recovery_list for the resource which have DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that case DLM_LOCK_RES_RECOVERING will not get set and the recovery master wouldn't know about this and the lockres that is in the middle of purging will get purged. For the lockres that got moved to resource list they will get migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set. So dlm_purge_list should consider this as being used and should defer purging. the lockres will get recovered and the new owner will be set and the flag DLM_LOCK_RES_RECOVERING. will get removed. dlm_purge_list can now go ahead and purge this lockres. I am following your idea. Addtion to your idea is that we also notice that we shouldn't send the DEREF request to the recovery master if we don't migrate the lockres to the recovery master(otherwise, another BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose. When we ignore migrating a lockres, we set this state. #define DLM_LOCK_RES_BLOCK_DIRTY 0x1000 #define DLM_LOCK_RES_SETREF_INPROG0x2000 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..7241070 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct dlm_work_item *item, void *data) /* any errors returned will be due to the new_master dying, * the dlm_reco_thread should detect this */ list_for_each_entry(res, resources, recovering) { +int ignore_mig = 0; +spin_lock(res-spinlock); +/* + * if we are dropping the lockres, no need to let the new master + * know the reference of this node. That is don't migrate the + * lockres to the new master. Also make sure we don't send DEREF + * request for the same lockres to the new master either. + */ +if (unlikely(res-state DLM_LOCK_RES_DROPPING_REF)) { +ignore_mig = 1; +res-state |= DLM_LOCK_RES_DE_DROP_REF; +} +spin_unlock(res-spinlock); +if (ignore_mig) { +mlog(ML_NOTICE, ignore migrating %.*s to recovery + master %u as we are dropping it\n, + res-lockname.len, res-lockname.name, + reco_master); +continue; +} + ret = dlm_send_one_lockres(dlm, res, mres, reco_master, DLM_MRES_RECOVERY); if (ret 0) { @@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; +assert_spin_locked(res-spinlock); + res-state |= DLM_LOCK_RES_RECOVERING; +res-state = ~DLM_LOCK_RES_DE_DROP_REF; + if (!list_empty(res-recovering)) { mlog(0, Recovering res %s:%.*s, is already on recovery list!\n, diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index d4f73ca..c4aa2ec 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -157,6 +157,9 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, { int master; int ret = 0; +int remote_drop = 1; + +
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
On 6/3/2010 6:43 PM, Wengang Wang wrote: Srini, On 10-06-03 18:06, Srinivas Eeda wrote: Comments inline On 6/3/2010 9:37 AM, Wengang Wang wrote: Changes to V1: 1 move the msleep to the second runs when the lockres is in recovery so the purging work on other lockres' can go. 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't resend deref in this case. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com --- fs/ocfs2/dlm/dlmcommon.h |1 + fs/ocfs2/dlm/dlmrecovery.c | 25 +++ fs/ocfs2/dlm/dlmthread.c | 73 ++- 3 files changed, 90 insertions(+), 9 deletions(-) diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h index 4b6ae2c..4194087 100644 --- a/fs/ocfs2/dlm/dlmcommon.h +++ b/fs/ocfs2/dlm/dlmcommon.h @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt *dlm, #define DLM_LOCK_RES_IN_PROGRESS 0x0010 #define DLM_LOCK_RES_MIGRATING0x0020 #define DLM_LOCK_RES_DROPPING_REF 0x0040 +#define DLM_LOCK_RES_DE_DROP_REF 0x0080 Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) If the idea of the fix is to address the race between purging and recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem. dlm_move_lockres_to_recovery_list moves lockres to resources list (which tracks of resources that needs recovery) and sets the flag DLM_LOCK_RES_RECOVERING. If we do not call dlm_move_lockres_to_recovery_list for the resource which have DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that case DLM_LOCK_RES_RECOVERING will not get set and the recovery master wouldn't know about this and the lockres that is in the middle of purging will get purged. For the lockres that got moved to resource list they will get migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set. So dlm_purge_list should consider this as being used and should defer purging. the lockres will get recovered and the new owner will be set and the flag DLM_LOCK_RES_RECOVERING. will get removed. dlm_purge_list can now go ahead and purge this lockres. I am following your idea. Addtion to your idea is that we also notice that we shouldn't send the DEREF request to the recovery master if we don't migrate the lockres to the recovery master(otherwise, another BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose. When we ignore migrating a lockres, we set this state. The case we don't migrate the lockres is only when it's dropping the reference right(when DLM_LOCK_RES_DROPPING_REF is set). In that case we just unhash and free the lockres. #define DLM_LOCK_RES_BLOCK_DIRTY 0x1000 #define DLM_LOCK_RES_SETREF_INPROG0x2000 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index f8b75ce..7241070 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct dlm_work_item *item, void *data) /* any errors returned will be due to the new_master dying, * the dlm_reco_thread should detect this */ list_for_each_entry(res, resources, recovering) { + int ignore_mig = 0; + spin_lock(res-spinlock); + /* +* if we are dropping the lockres, no need to let the new master +* know the reference of this node. That is don't migrate the +* lockres to the new master. Also make sure we don't send DEREF +* request for the same lockres to the new master either. +*/ + if (unlikely(res-state DLM_LOCK_RES_DROPPING_REF)) { + ignore_mig = 1; + res-state |= DLM_LOCK_RES_DE_DROP_REF; + } + spin_unlock(res-spinlock); + if (ignore_mig) { + mlog(ML_NOTICE, ignore migrating %.*s to recovery +master %u as we are dropping it\n, +res-lockname.len, res-lockname.name, +reco_master); + continue; + } + ret = dlm_send_one_lockres(dlm, res, mres, reco_master, DLM_MRES_RECOVERY); if (ret 0) { @@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm, struct list_head *queue; struct dlm_lock *lock, *next; + assert_spin_locked(res-spinlock); + res-state |= DLM_LOCK_RES_RECOVERING; + res-state = ~DLM_LOCK_RES_DE_DROP_REF; + if (!list_empty(res-recovering)) { mlog(0, Recovering res %s:%.*s, is already on recovery list!\n, diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
Srini, On 10-06-03 19:17, Srinivas Eeda wrote: Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) If the idea of the fix is to address the race between purging and recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem. dlm_move_lockres_to_recovery_list moves lockres to resources list (which tracks of resources that needs recovery) and sets the flag DLM_LOCK_RES_RECOVERING. If we do not call dlm_move_lockres_to_recovery_list for the resource which have DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that case DLM_LOCK_RES_RECOVERING will not get set and the recovery master wouldn't know about this and the lockres that is in the middle of purging will get purged. For the lockres that got moved to resource list they will get migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set. So dlm_purge_list should consider this as being used and should defer purging. the lockres will get recovered and the new owner will be set and the flag DLM_LOCK_RES_RECOVERING. will get removed. dlm_purge_list can now go ahead and purge this lockres. I am following your idea. Addtion to your idea is that we also notice that we shouldn't send the DEREF request to the recovery master if we don't migrate the lockres to the recovery master(otherwise, another BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose. When we ignore migrating a lockres, we set this state. The case we don't migrate the lockres is only when it's dropping the reference right(when DLM_LOCK_RES_DROPPING_REF is set). In that case we just unhash and free the lockres. How do you determine whether it's in that case in code? I determine that by checking the DLM_LOCK_RES_DE_DROP_REF state. regards, wengang. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel