Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-22 Thread Wengang Wang
On 10-07-21 11:19, Sunil Mushran wrote:
 So I discussed this problem with Srini. Yes, your patch is on the right
 track. Just that it needs to be rebased atop Srini's patch.
 

It seems that Srini's patch is not committed to mainline yet.

regards,
wengang.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-22 Thread Sunil Mushran
On 07/22/2010 03:51 AM, Wengang Wang wrote:
 On 10-07-21 11:19, Sunil Mushran wrote:

 So I discussed this problem with Srini. Yes, your patch is on the right
 track. Just that it needs to be rebased atop Srini's patch.

  
 It seems that Srini's patch is not committed to mainline yet.


No, it is not as yet. But should be in the queue. Just get the patch from
Srini and rebase your patch atop it.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-21 Thread Wengang Wang
On 10-07-20 15:33, Sunil Mushran wrote:
 On 07/19/2010 07:59 PM, Wengang Wang wrote:
 Do you have the message sequencing that would lead to this situation?
 If we migrate the lockres to the reco master, the reco master will send
 an assert that will make us change the master.
 So first, the problem is not about the changing owner. It is that
 the bit(in refmap) on behalf of the node in question is not cleared on the 
 new
 master(recovery master). So that the new master will fail at purging the 
 lockres
 due to the incorrect bit in refmap.
 
 Second, I have no messages at hand for the situation. But I think it is 
 simple
 enough.
 
 1) node A has no interest on lockres A any longer, so it is purging it.
 2) the owner of lockres A is node B, so node A is sending de-ref message
 to node B.
 3) at this time, node B crashed. node C becomes the recovery master. it 
 recovers
 lockres A(because the master is the dead node B).
 4) node A migrated lockres A to node C with a refbit there.
 5) node A failed to send de-ref message to node B because it crashed. The 
 failure
 is ignored. no other action is done for lockres A any more.
 
 In dlm_do_local_recovery_cleanup(), we expicitly clear the flag...
 when the owner is the dead_node. So this should not happen.

It reproduces in my test env.

Clearing the flag DLM_LOCK_RES_DROPPING_REF doesn't prevent anything.

dlm_do_local_recovery_cleanup() continue to move the lockres to recovery
list.

2337 /* the wake_up for this will happen when 
the
2338  * RECOVERING flag is dropped later */
2339 res-state = ~DLM_LOCK_RES_DROPPING_REF;
2340 
2341 dlm_move_lockres_to_recovery_list(dlm, 
res);


and dlm_purge_lockres() continue to unhash the lockres.

202 ret = dlm_drop_lockres_ref(dlm, res);
203 if (ret  0) {
204 mlog_errno(ret);
205 if (!dlm_is_host_down(ret))
206 BUG();
207 }
208 mlog(0, %s:%.*s: dlm_deref_lockres returned %d\n,
209  dlm-name, res-lockname.len, res-lockname.name, ret);
210 spin_lock(dlm-spinlock);
211 }
212 
213 spin_lock(res-spinlock);
214 if (!list_empty(res-purge)) {
215 mlog(0, removing lockres %.*s:%p from purgelist, 
216  master = %d\n, res-lockname.len, res-lockname.name,
217  res, master);
218 list_del_init(res-purge);
219 spin_unlock(res-spinlock);
220 dlm_lockres_put(res);
221 dlm-purge_count--;
222 } else
223 spin_unlock(res-spinlock);
224 
225 __dlm_unhash_lockres(res);


 Your patch changes the logic to exclude such lockres' from the
 recovery list. And that's a change, while possibly workable, needs
 to be looked into more thoroughly.
 
 In short, there is a disconnect between your description and your patch.
 Or, my understanding.

For mormal, we recover the lockres to recovery master, and then re-send the 
deref
message to it. That my privious patches do.

After discussing with Srini, we found ignoring the failure of deref to the 
original
master and not recovering the lockres to recovery master has the same effect. 
And
it's simpler.

The patch fixes the bug per my test result.

regards,
wengang.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-20 Thread Sunil Mushran
On 07/19/2010 07:59 PM, Wengang Wang wrote:
 Do you have the message sequencing that would lead to this situation?
 If we migrate the lockres to the reco master, the reco master will send
 an assert that will make us change the master.
  
 So first, the problem is not about the changing owner. It is that
 the bit(in refmap) on behalf of the node in question is not cleared on the new
 master(recovery master). So that the new master will fail at purging the 
 lockres
 due to the incorrect bit in refmap.

 Second, I have no messages at hand for the situation. But I think it is simple
 enough.

 1) node A has no interest on lockres A any longer, so it is purging it.
 2) the owner of lockres A is node B, so node A is sending de-ref message
 to node B.
 3) at this time, node B crashed. node C becomes the recovery master. it 
 recovers
 lockres A(because the master is the dead node B).
 4) node A migrated lockres A to node C with a refbit there.
 5) node A failed to send de-ref message to node B because it crashed. The 
 failure
 is ignored. no other action is done for lockres A any more.


In dlm_do_local_recovery_cleanup(), we expicitly clear the flag...
when the owner is the dead_node. So this should not happen.

Your patch changes the logic to exclude such lockres' from the
recovery list. And that's a change, while possibly workable, needs
to be looked into more thoroughly.

In short, there is a disconnect between your description and your patch.
Or, my understanding.

 So node A means to drop the ref on the master. But in such a situation, node C
 keeps the ref on behalf of node A unexpectedly. Node C finally fails at 
 purging
 lockres A and hang on umount.


 I think your problem is the one race we have concerning reco/migration.
 If so, this fix is not enough.
  
 It's a problem of purging + recovery. no pure migration for umount here.
 So what's your concern?


See above.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-19 Thread Wengang Wang
Any comment?

On 10-06-11 00:25, Wengang Wang wrote:
 If the dlm recovery goes on the non-master node where purging work is going 
 on, 
 There could be unexpected reference left on some lockres' on recovery master.
 That is because we migrated the lockres' to recovery master but didn't send
 deref requests to it accordingly(was sending to the dead original master or to
 the UNKNOWN).
 
 Fix:
 For the lockres which is in progress of dropping reference, we don't migrate 
 it
 to recovery master and unhash the lockres in the purge work.
 For those not in progress of the dropping, delay the purge work until recovery
 finished so that it can send deref request to the correct master(recovery
 master) later.
 
 Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
 ---
  fs/ocfs2/dlm/dlmrecovery.c |   17 +++--
  fs/ocfs2/dlm/dlmthread.c   |   36 ++--
  2 files changed, 37 insertions(+), 16 deletions(-)
 
 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
 index f8b75ce..43530ce 100644
 --- a/fs/ocfs2/dlm/dlmrecovery.c
 +++ b/fs/ocfs2/dlm/dlmrecovery.c
 @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt 
 *dlm,
   struct list_head *queue;
   struct dlm_lock *lock, *next;
  
 + assert_spin_locked(dlm-spinlock);
 + assert_spin_locked(res-spinlock);
   res-state |= DLM_LOCK_RES_RECOVERING;
   if (!list_empty(res-recovering)) {
   mlog(0,
 @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct 
 dlm_ctxt *dlm, u8 dead_node)
  
   /* the wake_up for this will happen when the
* RECOVERING flag is dropped later */
 - res-state = ~DLM_LOCK_RES_DROPPING_REF;
 + if (res-state  DLM_LOCK_RES_DROPPING_REF) {
 + /*
 +  * don't migrate a lockres which is in
 +  * progress of dropping ref
 +  */
 + mlog(ML_NOTICE, %.*s ignored for 
 +  migration\n, res-lockname.len,
 +  res-lockname.name);
 + res-state =
 + ~DLM_LOCK_RES_DROPPING_REF;
 + } else
 + dlm_move_lockres_to_recovery_list(dlm,
 +   res);
  
 - dlm_move_lockres_to_recovery_list(dlm, res);
   } else if (res-owner == dlm-node_num) {
   dlm_free_dead_locks(dlm, res, dead_node);
   __dlm_lockres_calc_usage(dlm, res);
 diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
 index d4f73ca..0771420 100644
 --- a/fs/ocfs2/dlm/dlmthread.c
 +++ b/fs/ocfs2/dlm/dlmthread.c
 @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
   * truly ready to be freed. */
  int __dlm_lockres_unused(struct dlm_lock_resource *res)
  {
 - if (!__dlm_lockres_has_locks(res) 
 - (list_empty(res-dirty)  !(res-state  DLM_LOCK_RES_DIRTY))) {
 - /* try not to scan the bitmap unless the first two
 -  * conditions are already true */
 - int bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0);
 - if (bit = O2NM_MAX_NODES) {
 - /* since the bit for dlm-node_num is not
 -  * set, inflight_locks better be zero */
 - BUG_ON(res-inflight_locks != 0);
 - return 1;
 - }
 + int bit;
 +
 + if (__dlm_lockres_has_locks(res))
 + return 0;
 +
 + if (!list_empty(res-dirty) || res-state  DLM_LOCK_RES_DIRTY)
 + return 0;
 +
 + if (res-state  DLM_LOCK_RES_RECOVERING)
 + return 0;
 +
 + bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0);
 + if (bit = O2NM_MAX_NODES) {
 + /* since the bit for dlm-node_num is not
 +  * set, inflight_locks better be zero */
 + BUG_ON(res-inflight_locks != 0);
 + return 1;
   }
   return 0;
  }
 @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
   int master;
   int ret = 0;
  
 + assert_spin_locked(dlm-spinlock);
 +
   spin_lock(res-spinlock);
   if (!__dlm_lockres_unused(res)) {
   mlog(0, %s:%.*s: tried to purge but not unused\n,
 @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
master = %d\n, res-lockname.len, res-lockname.name,
res, master);
   list_del_init(res-purge);
 - spin_unlock(res-spinlock);
 + /* not the last ref */
   

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-19 Thread Sunil Mushran
Srini has a patch that handles a race concerning DLM_LOCK_RES_DROPPING_REF.
You may want to look a this afresh with that patch.

My problem with this patch is that the description is not clear.


That is because we migrated the lockres' to recovery master but didn't send
deref requests to it accordingly(was sending to the dead original master 
or to
the UNKNOWN).
 

Do you have the message sequencing that would lead to this situation?
If we migrate the lockres to the reco master, the reco master will send
an assert that will make us change the master.

I think your problem is the one race we have concerning reco/migration.
If so, this fix is not enough.

Sunil

On 07/19/2010 03:09 AM, Wengang Wang wrote:
 Any comment?

 On 10-06-11 00:25, Wengang Wang wrote:
 If the dlm recovery goes on the non-master node where purging work is going 
 on,
 There could be unexpected reference left on some lockres' on recovery master.
 That is because we migrated the lockres' to recovery master but didn't send
 deref requests to it accordingly(was sending to the dead original master or 
 to
 the UNKNOWN).

 Fix:
 For the lockres which is in progress of dropping reference, we don't migrate 
 it
 to recovery master and unhash the lockres in the purge work.
 For those not in progress of the dropping, delay the purge work until 
 recovery
 finished so that it can send deref request to the correct master(recovery
 master) later.

 Signed-off-by: Wengang Wangwen.gang.w...@oracle.com
 ---
   fs/ocfs2/dlm/dlmrecovery.c |   17 +++--
   fs/ocfs2/dlm/dlmthread.c   |   36 ++--
   2 files changed, 37 insertions(+), 16 deletions(-)

 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
 index f8b75ce..43530ce 100644
 --- a/fs/ocfs2/dlm/dlmrecovery.c
 +++ b/fs/ocfs2/dlm/dlmrecovery.c
 @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt 
 *dlm,
  struct list_head *queue;
  struct dlm_lock *lock, *next;

 +assert_spin_locked(dlm-spinlock);
 +assert_spin_locked(res-spinlock);
  res-state |= DLM_LOCK_RES_RECOVERING;
  if (!list_empty(res-recovering)) {
  mlog(0,
 @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct 
 dlm_ctxt *dlm, u8 dead_node)

  /* the wake_up for this will happen when the
   * RECOVERING flag is dropped later */
 -res-state= ~DLM_LOCK_RES_DROPPING_REF;
 +if (res-state  DLM_LOCK_RES_DROPPING_REF) {
 +/*
 + * don't migrate a lockres which is in
 + * progress of dropping ref
 + */
 +mlog(ML_NOTICE, %.*s ignored for 
 + migration\n, res-lockname.len,
 + res-lockname.name);
 +res-state=
 +~DLM_LOCK_RES_DROPPING_REF;
 +} else
 +dlm_move_lockres_to_recovery_list(dlm,
 +  res);

 -dlm_move_lockres_to_recovery_list(dlm, res);
  } else if (res-owner == dlm-node_num) {
  dlm_free_dead_locks(dlm, res, dead_node);
  __dlm_lockres_calc_usage(dlm, res);
 diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
 index d4f73ca..0771420 100644
 --- a/fs/ocfs2/dlm/dlmthread.c
 +++ b/fs/ocfs2/dlm/dlmthread.c
 @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource 
 *res)
* truly ready to be freed. */
   int __dlm_lockres_unused(struct dlm_lock_resource *res)
   {
 -if (!__dlm_lockres_has_locks(res)
 -(list_empty(res-dirty)  !(res-state  DLM_LOCK_RES_DIRTY))) {
 -/* try not to scan the bitmap unless the first two
 - * conditions are already true */
 -int bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0);
 -if (bit= O2NM_MAX_NODES) {
 -/* since the bit for dlm-node_num is not
 - * set, inflight_locks better be zero */
 -BUG_ON(res-inflight_locks != 0);
 -return 1;
 -}
 +int bit;
 +
 +if (__dlm_lockres_has_locks(res))
 +return 0;
 +
 +if (!list_empty(res-dirty) || res-state  DLM_LOCK_RES_DIRTY)
 +return 0;
 +
 +if (res-state  DLM_LOCK_RES_RECOVERING)
 +return 0;
 +
 +bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0);
 +if (bit= O2NM_MAX_NODES) {
 +/* since the bit for dlm-node_num is not
 + * set, inflight_locks better be zero */
 +BUG_ON(res-inflight_locks 

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-19 Thread Wengang Wang
On 10-07-19 16:52, Sunil Mushran wrote:
 Srini has a patch that handles a race concerning DLM_LOCK_RES_DROPPING_REF.
 You may want to look a this afresh with that patch.
 
 My problem with this patch is that the description is not clear.
 
 
 That is because we migrated the lockres' to recovery master but didn't send
 deref requests to it accordingly(was sending to the dead original
 master or to
 the UNKNOWN).
 
 
 Do you have the message sequencing that would lead to this situation?
 If we migrate the lockres to the reco master, the reco master will send
 an assert that will make us change the master.

So first, the problem is not about the changing owner. It is that
the bit(in refmap) on behalf of the node in question is not cleared on the new
master(recovery master). So that the new master will fail at purging the lockres
due to the incorrect bit in refmap.

Second, I have no messages at hand for the situation. But I think it is simple
enough.

1) node A has no interest on lockres A any longer, so it is purging it.
2) the owner of lockres A is node B, so node A is sending de-ref message
to node B.
3) at this time, node B crashed. node C becomes the recovery master. it recovers
lockres A(because the master is the dead node B).
4) node A migrated lockres A to node C with a refbit there.
5) node A failed to send de-ref message to node B because it crashed. The 
failure
is ignored. no other action is done for lockres A any more.

So node A means to drop the ref on the master. But in such a situation, node C
keeps the ref on behalf of node A unexpectedly. Node C finally fails at purging
lockres A and hang on umount.

 
 I think your problem is the one race we have concerning reco/migration.
 If so, this fix is not enough.

It's a problem of purging + recovery. no pure migration for umount here.
So what's your concern?

regards,
wengang.
 
 Sunil
 
 On 07/19/2010 03:09 AM, Wengang Wang wrote:
 Any comment?
 
 On 10-06-11 00:25, Wengang Wang wrote:
 If the dlm recovery goes on the non-master node where purging work is going 
 on,
 There could be unexpected reference left on some lockres' on recovery 
 master.
 That is because we migrated the lockres' to recovery master but didn't send
 deref requests to it accordingly(was sending to the dead original master or 
 to
 the UNKNOWN).
 
 Fix:
 For the lockres which is in progress of dropping reference, we don't 
 migrate it
 to recovery master and unhash the lockres in the purge work.
 For those not in progress of the dropping, delay the purge work until 
 recovery
 finished so that it can send deref request to the correct master(recovery
 master) later.
 
 Signed-off-by: Wengang Wangwen.gang.w...@oracle.com
 ---
   fs/ocfs2/dlm/dlmrecovery.c |   17 +++--
   fs/ocfs2/dlm/dlmthread.c   |   36 ++--
   2 files changed, 37 insertions(+), 16 deletions(-)
 
 diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
 index f8b75ce..43530ce 100644
 --- a/fs/ocfs2/dlm/dlmrecovery.c
 +++ b/fs/ocfs2/dlm/dlmrecovery.c
 @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct 
 dlm_ctxt *dlm,
 struct list_head *queue;
 struct dlm_lock *lock, *next;
 
 +   assert_spin_locked(dlm-spinlock);
 +   assert_spin_locked(res-spinlock);
 res-state |= DLM_LOCK_RES_RECOVERING;
 if (!list_empty(res-recovering)) {
 mlog(0,
 @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct 
 dlm_ctxt *dlm, u8 dead_node)
 
 /* the wake_up for this will happen when the
  * RECOVERING flag is dropped later */
 -   res-state= ~DLM_LOCK_RES_DROPPING_REF;
 +   if (res-state  DLM_LOCK_RES_DROPPING_REF) {
 +   /*
 +* don't migrate a lockres which is in
 +* progress of dropping ref
 +*/
 +   mlog(ML_NOTICE, %.*s ignored for 
 +migration\n, res-lockname.len,
 +res-lockname.name);
 +   res-state=
 +   ~DLM_LOCK_RES_DROPPING_REF;
 +   } else
 +   dlm_move_lockres_to_recovery_list(dlm,
 + res);
 
 -   dlm_move_lockres_to_recovery_list(dlm, res);
 } else if (res-owner == dlm-node_num) {
 dlm_free_dead_locks(dlm, res, dead_node);
 __dlm_lockres_calc_usage(dlm, res);
 diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
 index d4f73ca..0771420 100644
 --- a/fs/ocfs2/dlm/dlmthread.c
 +++ b/fs/ocfs2/dlm/dlmthread.c
 @@ -92,17 +92,23 @@ int 

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master

2010-07-05 Thread Wengang Wang
Any comment?

On 10-06-25 09:55, Wengang Wang wrote:
 Hi,
 
 Any comment on this?
 
 regards,
 wengang.
 On 10-06-11 00:25, Wengang Wang wrote:
  If the dlm recovery goes on the non-master node where purging work is going 
  on, 
  There could be unexpected reference left on some lockres' on recovery 
  master.
  That is because we migrated the lockres' to recovery master but didn't send
  deref requests to it accordingly(was sending to the dead original master or 
  to
  the UNKNOWN).
  
  Fix:
  For the lockres which is in progress of dropping reference, we don't 
  migrate it
  to recovery master and unhash the lockres in the purge work.
  For those not in progress of the dropping, delay the purge work until 
  recovery
  finished so that it can send deref request to the correct master(recovery
  master) later.
  
  Signed-off-by: Wengang Wang wen.gang.w...@oracle.com
  ---
   fs/ocfs2/dlm/dlmrecovery.c |   17 +++--
   fs/ocfs2/dlm/dlmthread.c   |   36 ++--
   2 files changed, 37 insertions(+), 16 deletions(-)
  
  diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
  index f8b75ce..43530ce 100644
  --- a/fs/ocfs2/dlm/dlmrecovery.c
  +++ b/fs/ocfs2/dlm/dlmrecovery.c
  @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct 
  dlm_ctxt *dlm,
  struct list_head *queue;
  struct dlm_lock *lock, *next;
   
  +   assert_spin_locked(dlm-spinlock);
  +   assert_spin_locked(res-spinlock);
  res-state |= DLM_LOCK_RES_RECOVERING;
  if (!list_empty(res-recovering)) {
  mlog(0,
  @@ -2336,9 +2338,20 @@ static void dlm_do_local_recovery_cleanup(struct 
  dlm_ctxt *dlm, u8 dead_node)
   
  /* the wake_up for this will happen when the
   * RECOVERING flag is dropped later */
  -   res-state = ~DLM_LOCK_RES_DROPPING_REF;
  +   if (res-state  DLM_LOCK_RES_DROPPING_REF) {
  +   /*
  +* don't migrate a lockres which is in
  +* progress of dropping ref
  +*/
  +   mlog(ML_NOTICE, %.*s ignored for 
  +migration\n, res-lockname.len,
  +res-lockname.name);
  +   res-state =
  +   ~DLM_LOCK_RES_DROPPING_REF;
  +   } else
  +   dlm_move_lockres_to_recovery_list(dlm,
  + res);
   
  -   dlm_move_lockres_to_recovery_list(dlm, res);
  } else if (res-owner == dlm-node_num) {
  dlm_free_dead_locks(dlm, res, dead_node);
  __dlm_lockres_calc_usage(dlm, res);
  diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
  index d4f73ca..0771420 100644
  --- a/fs/ocfs2/dlm/dlmthread.c
  +++ b/fs/ocfs2/dlm/dlmthread.c
  @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource 
  *res)
* truly ready to be freed. */
   int __dlm_lockres_unused(struct dlm_lock_resource *res)
   {
  -   if (!__dlm_lockres_has_locks(res) 
  -   (list_empty(res-dirty)  !(res-state  DLM_LOCK_RES_DIRTY))) {
  -   /* try not to scan the bitmap unless the first two
  -* conditions are already true */
  -   int bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0);
  -   if (bit = O2NM_MAX_NODES) {
  -   /* since the bit for dlm-node_num is not
  -* set, inflight_locks better be zero */
  -   BUG_ON(res-inflight_locks != 0);
  -   return 1;
  -   }
  +   int bit;
  +
  +   if (__dlm_lockres_has_locks(res))
  +   return 0;
  +
  +   if (!list_empty(res-dirty) || res-state  DLM_LOCK_RES_DIRTY)
  +   return 0;
  +
  +   if (res-state  DLM_LOCK_RES_RECOVERING)
  +   return 0;
  +
  +   bit = find_next_bit(res-refmap, O2NM_MAX_NODES, 0);
  +   if (bit = O2NM_MAX_NODES) {
  +   /* since the bit for dlm-node_num is not
  +* set, inflight_locks better be zero */
  +   BUG_ON(res-inflight_locks != 0);
  +   return 1;
  }
  return 0;
   }
  @@ -158,6 +164,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
  int master;
  int ret = 0;
   
  +   assert_spin_locked(dlm-spinlock);
  +
  spin_lock(res-spinlock);
  if (!__dlm_lockres_unused(res)) {
  mlog(0, %s:%.*s: tried to purge but not unused\n,
  @@ -216,13 +224,13 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
   master = %d\n, res-lockname.len, res-lockname.name,
   res, master);