Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-08 Thread Wengang Wang
On 10-06-03 18:06, Srinivas Eeda wrote:
 Comments inline
 
 On 6/3/2010 9:37 AM, Wengang Wang wrote:
 Changes to V1:
 1 move the msleep to the second runs when the lockres is in recovery so the
   purging work on other lockres' can go.
 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't
   resend deref in this case.
 
 Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
 ---
  fs/ocfs2/dlm/dlmcommon.h   |1 +
  fs/ocfs2/dlm/dlmrecovery.c |   25 +++
  fs/ocfs2/dlm/dlmthread.c   |   73 
  ++-
  3 files changed, 90 insertions(+), 9 deletions(-)
 
 diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
 index 4b6ae2c..4194087 100644
 --- a/fs/ocfs2/dlm/dlmcommon.h
 +++ b/fs/ocfs2/dlm/dlmcommon.h
 @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct 
 dlm_ctxt *dlm,
  #define DLM_LOCK_RES_IN_PROGRESS  0x0010
  #define DLM_LOCK_RES_MIGRATING0x0020
  #define DLM_LOCK_RES_DROPPING_REF 0x0040
 +#define DLM_LOCK_RES_DE_DROP_REF  0x0080
 Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :)
 

Discussed with Srini privately, both agreed we don't purge a lockres
when it's in recovery state avoiding adding the new flag.

I will send the revised patch later.

regards,
wengang.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-05 Thread Srinivas Eeda

On 6/3/2010 10:37 PM, Wengang Wang wrote:

Srini,

On 10-06-03 19:17, Srinivas Eeda wrote:
  

Can you please explain the idea of the new flag
DLM_LOCK_RES_DE_DROP_REF :)

If the idea of the fix is to address the race between purging and
recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and
DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem.
dlm_move_lockres_to_recovery_list moves lockres to resources list
(which tracks of resources that needs recovery) and sets the flag
DLM_LOCK_RES_RECOVERING. If we do not call
dlm_move_lockres_to_recovery_list for the resource which have
DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that
case DLM_LOCK_RES_RECOVERING will not get set and the recovery
master wouldn't know about this and the lockres that is in the
middle of purging will get purged.

For the lockres that got moved to resource list they will get
migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set.
So dlm_purge_list should consider this as being used and should
defer purging. the lockres will get recovered and the new owner will
be set and the flag DLM_LOCK_RES_RECOVERING. will get removed.
dlm_purge_list can now go ahead and purge this lockres.



I am following your idea. Addtion to your idea is that we also notice
that we shouldn't send the DEREF request to the recovery master if we
don't migrate the lockres to the recovery master(otherwise, another
BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose.
When
we ignore migrating a lockres, we set this state.

  

The case we don't migrate the lockres is only when it's dropping the
reference right(when DLM_LOCK_RES_DROPPING_REF is set). In that case we
just unhash and free the lockres.



How do you determine whether it's in that case in code? I determine that by
checking the DLM_LOCK_RES_DE_DROP_REF state.
  
If DLM_LOCK_RES_DROPPING_REF is not set for the lockres, then it can get 
migrated(even if it's on the purge list).

regards,
wengang.
  
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-03 Thread Srinivas Eeda
Comments inline

On 6/3/2010 9:37 AM, Wengang Wang wrote:
 Changes to V1:
 1 move the msleep to the second runs when the lockres is in recovery so the
   purging work on other lockres' can go.
 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't
   resend deref in this case.

 Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
 ---
  fs/ocfs2/dlm/dlmcommon.h   |1 +
  fs/ocfs2/dlm/dlmrecovery.c |   25 +++
  fs/ocfs2/dlm/dlmthread.c   |   73 ++-
  3 files changed, 90 insertions(+), 9 deletions(-)

 diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
 index 4b6ae2c..4194087 100644
 --- a/fs/ocfs2/dlm/dlmcommon.h
 +++ b/fs/ocfs2/dlm/dlmcommon.h
 @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt 
 *dlm,
  #define DLM_LOCK_RES_IN_PROGRESS  0x0010
  #define DLM_LOCK_RES_MIGRATING0x0020
  #define DLM_LOCK_RES_DROPPING_REF 0x0040
 +#define DLM_LOCK_RES_DE_DROP_REF  0x0080
   
Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :)

If the idea of the fix is to address the race between purging and 
recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and 
DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem. 
dlm_move_lockres_to_recovery_list moves lockres to resources list (which 
tracks of resources that needs recovery) and sets the flag 
DLM_LOCK_RES_RECOVERING. If we do not call 
dlm_move_lockres_to_recovery_list for the resource which have 
DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that case 
DLM_LOCK_RES_RECOVERING will not get set and the recovery master 
wouldn't know about this and the lockres that is in the middle of 
purging will get purged.

For the lockres that got moved to resource list they will get migrated. 
In that case lockres has DLM_LOCK_RES_RECOVERING.flag set. So 
dlm_purge_list should consider this as being used and should defer 
purging. the lockres will get recovered and the new owner will be set 
and the flag DLM_LOCK_RES_RECOVERING. will get removed. dlm_purge_list 
can now go ahead and purge this lockres.
  #define DLM_LOCK_RES_BLOCK_DIRTY  0x1000
  #define DLM_LOCK_RES_SETREF_INPROG0x2000
  
 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
 index f8b75ce..7241070 100644
 --- a/fs/ocfs2/dlm/dlmrecovery.c
 +++ b/fs/ocfs2/dlm/dlmrecovery.c
 @@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct 
 dlm_work_item *item, void *data)
   /* any errors returned will be due to the new_master dying,
* the dlm_reco_thread should detect this */
   list_for_each_entry(res, resources, recovering) {
 + int ignore_mig = 0;
 + spin_lock(res-spinlock);
 + /*
 +  * if we are dropping the lockres, no need to let the new master
 +  * know the reference of this node. That is don't migrate the
 +  * lockres to the new master. Also make sure we don't send DEREF
 +  * request for the same lockres to the new master either.
 +  */
 + if (unlikely(res-state  DLM_LOCK_RES_DROPPING_REF)) {
 + ignore_mig = 1;
 + res-state |= DLM_LOCK_RES_DE_DROP_REF;
 + }
 + spin_unlock(res-spinlock);
 + if (ignore_mig) {
 + mlog(ML_NOTICE, ignore migrating %.*s to recovery 
 +  master %u as we are dropping it\n,
 +  res-lockname.len, res-lockname.name,
 +  reco_master);
 + continue;
 + }
 +
   ret = dlm_send_one_lockres(dlm, res, mres, reco_master,
   DLM_MRES_RECOVERY);
   if (ret  0) {
 @@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt 
 *dlm,
   struct list_head *queue;
   struct dlm_lock *lock, *next;
  
 + assert_spin_locked(res-spinlock);
 +
   res-state |= DLM_LOCK_RES_RECOVERING;
 + res-state = ~DLM_LOCK_RES_DE_DROP_REF;
 +
   if (!list_empty(res-recovering)) {
   mlog(0,
Recovering res %s:%.*s, is already on recovery list!\n,
 diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
 index d4f73ca..c4aa2ec 100644
 --- a/fs/ocfs2/dlm/dlmthread.c
 +++ b/fs/ocfs2/dlm/dlmthread.c
 @@ -157,6 +157,9 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
  {
   int master;
   int ret = 0;
 + int remote_drop = 1;
 +
 + assert_spin_locked(dlm-spinlock);
  
   spin_lock(res-spinlock);
   if (!__dlm_lockres_unused(res)) {
 @@ -184,12 +187,19 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
  
   if (!master)
   res-state |= DLM_LOCK_RES_DROPPING_REF;
 +
 + /*
 +  * If we didn't migrate this lockres to recovery master, don't send
 +  * 

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-03 Thread Wengang Wang
Srini,

On 10-06-03 18:06, Srinivas Eeda wrote:
 Comments inline
 
 On 6/3/2010 9:37 AM, Wengang Wang wrote:
 Changes to V1:
 1 move the msleep to the second runs when the lockres is in recovery so the
   purging work on other lockres' can go.
 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't
   resend deref in this case.
 
 Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
 ---
  fs/ocfs2/dlm/dlmcommon.h   |1 +
  fs/ocfs2/dlm/dlmrecovery.c |   25 +++
  fs/ocfs2/dlm/dlmthread.c   |   73 
  ++-
  3 files changed, 90 insertions(+), 9 deletions(-)
 
 diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
 index 4b6ae2c..4194087 100644
 --- a/fs/ocfs2/dlm/dlmcommon.h
 +++ b/fs/ocfs2/dlm/dlmcommon.h
 @@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct 
 dlm_ctxt *dlm,
  #define DLM_LOCK_RES_IN_PROGRESS  0x0010
  #define DLM_LOCK_RES_MIGRATING0x0020
  #define DLM_LOCK_RES_DROPPING_REF 0x0040
 +#define DLM_LOCK_RES_DE_DROP_REF  0x0080
 Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :)
 
 If the idea of the fix is to address the race between purging and
 recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and
 DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem.
 dlm_move_lockres_to_recovery_list moves lockres to resources list
 (which tracks of resources that needs recovery) and sets the flag
 DLM_LOCK_RES_RECOVERING. If we do not call
 dlm_move_lockres_to_recovery_list for the resource which have
 DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that
 case DLM_LOCK_RES_RECOVERING will not get set and the recovery
 master wouldn't know about this and the lockres that is in the
 middle of purging will get purged.
 
 For the lockres that got moved to resource list they will get
 migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set.
 So dlm_purge_list should consider this as being used and should
 defer purging. the lockres will get recovered and the new owner will
 be set and the flag DLM_LOCK_RES_RECOVERING. will get removed.
 dlm_purge_list can now go ahead and purge this lockres.

I am following your idea. Addtion to your idea is that we also notice
that we shouldn't send the DEREF request to the recovery master if we
don't migrate the lockres to the recovery master(otherwise, another
BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose. When
we ignore migrating a lockres, we set this state.

  #define DLM_LOCK_RES_BLOCK_DIRTY  0x1000
  #define DLM_LOCK_RES_SETREF_INPROG0x2000
 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
 index f8b75ce..7241070 100644
 --- a/fs/ocfs2/dlm/dlmrecovery.c
 +++ b/fs/ocfs2/dlm/dlmrecovery.c
 @@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct 
 dlm_work_item *item, void *data)
  /* any errors returned will be due to the new_master dying,
   * the dlm_reco_thread should detect this */
  list_for_each_entry(res, resources, recovering) {
 +int ignore_mig = 0;
 +spin_lock(res-spinlock);
 +/*
 + * if we are dropping the lockres, no need to let the new master
 + * know the reference of this node. That is don't migrate the
 + * lockres to the new master. Also make sure we don't send DEREF
 + * request for the same lockres to the new master either.
 + */
 +if (unlikely(res-state  DLM_LOCK_RES_DROPPING_REF)) {
 +ignore_mig = 1;
 +res-state |= DLM_LOCK_RES_DE_DROP_REF;
 +}
 +spin_unlock(res-spinlock);
 +if (ignore_mig) {
 +mlog(ML_NOTICE, ignore migrating %.*s to recovery 
 + master %u as we are dropping it\n,
 + res-lockname.len, res-lockname.name,
 + reco_master);
 +continue;
 +}
 +
  ret = dlm_send_one_lockres(dlm, res, mres, reco_master,
  DLM_MRES_RECOVERY);
  if (ret  0) {
 @@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct 
 dlm_ctxt *dlm,
  struct list_head *queue;
  struct dlm_lock *lock, *next;
 +assert_spin_locked(res-spinlock);
 +
  res-state |= DLM_LOCK_RES_RECOVERING;
 +res-state = ~DLM_LOCK_RES_DE_DROP_REF;
 +
  if (!list_empty(res-recovering)) {
  mlog(0,
   Recovering res %s:%.*s, is already on recovery list!\n,
 diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
 index d4f73ca..c4aa2ec 100644
 --- a/fs/ocfs2/dlm/dlmthread.c
 +++ b/fs/ocfs2/dlm/dlmthread.c
 @@ -157,6 +157,9 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
  {
  int master;
  int ret = 0;
 +int remote_drop = 1;
 +
 +

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-03 Thread Srinivas Eeda

On 6/3/2010 6:43 PM, Wengang Wang wrote:

Srini,

On 10-06-03 18:06, Srinivas Eeda wrote:
  

Comments inline

On 6/3/2010 9:37 AM, Wengang Wang wrote:


Changes to V1:
1 move the msleep to the second runs when the lockres is in recovery so the
 purging work on other lockres' can go.
2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't
 resend deref in this case.

Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
---
fs/ocfs2/dlm/dlmcommon.h   |1 +
fs/ocfs2/dlm/dlmrecovery.c |   25 +++
fs/ocfs2/dlm/dlmthread.c   |   73 ++-
3 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 4b6ae2c..4194087 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt 
*dlm,
#define DLM_LOCK_RES_IN_PROGRESS  0x0010
#define DLM_LOCK_RES_MIGRATING0x0020
#define DLM_LOCK_RES_DROPPING_REF 0x0040
+#define DLM_LOCK_RES_DE_DROP_REF  0x0080
  

Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :)

If the idea of the fix is to address the race between purging and
recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and
DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem.
dlm_move_lockres_to_recovery_list moves lockres to resources list
(which tracks of resources that needs recovery) and sets the flag
DLM_LOCK_RES_RECOVERING. If we do not call
dlm_move_lockres_to_recovery_list for the resource which have
DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that
case DLM_LOCK_RES_RECOVERING will not get set and the recovery
master wouldn't know about this and the lockres that is in the
middle of purging will get purged.

For the lockres that got moved to resource list they will get
migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set.
So dlm_purge_list should consider this as being used and should
defer purging. the lockres will get recovered and the new owner will
be set and the flag DLM_LOCK_RES_RECOVERING. will get removed.
dlm_purge_list can now go ahead and purge this lockres.



I am following your idea. Addtion to your idea is that we also notice
that we shouldn't send the DEREF request to the recovery master if we
don't migrate the lockres to the recovery master(otherwise, another
BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose. When
we ignore migrating a lockres, we set this state.
  
The case we don't migrate the lockres is only when it's dropping the 
reference right(when DLM_LOCK_RES_DROPPING_REF is set). In that case we 
just unhash and free the lockres.

#define DLM_LOCK_RES_BLOCK_DIRTY  0x1000
#define DLM_LOCK_RES_SETREF_INPROG0x2000
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index f8b75ce..7241070 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct 
dlm_work_item *item, void *data)
/* any errors returned will be due to the new_master dying,
 * the dlm_reco_thread should detect this */
list_for_each_entry(res, resources, recovering) {
+   int ignore_mig = 0;
+   spin_lock(res-spinlock);
+   /*
+* if we are dropping the lockres, no need to let the new master
+* know the reference of this node. That is don't migrate the
+* lockres to the new master. Also make sure we don't send DEREF
+* request for the same lockres to the new master either.
+*/
+   if (unlikely(res-state  DLM_LOCK_RES_DROPPING_REF)) {
+   ignore_mig = 1;
+   res-state |= DLM_LOCK_RES_DE_DROP_REF;
+   }
+   spin_unlock(res-spinlock);
+   if (ignore_mig) {
+   mlog(ML_NOTICE, ignore migrating %.*s to recovery 
+master %u as we are dropping it\n,
+res-lockname.len, res-lockname.name,
+reco_master);
+   continue;
+   }
+
ret = dlm_send_one_lockres(dlm, res, mres, reco_master,
DLM_MRES_RECOVERY);
if (ret  0) {
@@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt 
*dlm,
struct list_head *queue;
struct dlm_lock *lock, *next;
+   assert_spin_locked(res-spinlock);
+
res-state |= DLM_LOCK_RES_RECOVERING;
+   res-state = ~DLM_LOCK_RES_DE_DROP_REF;
+
if (!list_empty(res-recovering)) {
mlog(0,
 Recovering res %s:%.*s, is already on recovery list!\n,
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index 

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-03 Thread Wengang Wang
Srini,

On 10-06-03 19:17, Srinivas Eeda wrote:
 Can you please explain the idea of the new flag
 DLM_LOCK_RES_DE_DROP_REF :)

 If the idea of the fix is to address the race between purging and
 recovery, I am wondering DLM_LOCK_RES_DROPPING_REF and
 DLM_LOCK_RES_RECOVERING flags may be enough to fix this problem.
 dlm_move_lockres_to_recovery_list moves lockres to resources list
 (which tracks of resources that needs recovery) and sets the flag
 DLM_LOCK_RES_RECOVERING. If we do not call
 dlm_move_lockres_to_recovery_list for the resource which have
 DLM_LOCK_RES_DROPPING_REF set they will not get migrated. In that
 case DLM_LOCK_RES_RECOVERING will not get set and the recovery
 master wouldn't know about this and the lockres that is in the
 middle of purging will get purged.

 For the lockres that got moved to resource list they will get
 migrated. In that case lockres has DLM_LOCK_RES_RECOVERING.flag set.
 So dlm_purge_list should consider this as being used and should
 defer purging. the lockres will get recovered and the new owner will
 be set and the flag DLM_LOCK_RES_RECOVERING. will get removed.
 dlm_purge_list can now go ahead and purge this lockres.


 I am following your idea. Addtion to your idea is that we also notice
 that we shouldn't send the DEREF request to the recovery master if we
 don't migrate the lockres to the recovery master(otherwise, another
 BUG() is triggered). DLM_LOCK_RES_DE_DROP_REF is for that purpose.
 When
 we ignore migrating a lockres, we set this state.

The case we don't migrate the lockres is only when it's dropping the
reference right(when DLM_LOCK_RES_DROPPING_REF is set). In that case we
just unhash and free the lockres.

How do you determine whether it's in that case in code? I determine that by
checking the DLM_LOCK_RES_DE_DROP_REF state.

regards,
wengang.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel