Re: [Ocfs2-devel] [PATCH v2] ocfs2/dlm: don't handle migrate lockres if already in shutdown

2018-03-04 Thread piaojun
Hi Joseph,

On 2018/3/5 9:27, Joseph Qi wrote:
> 
> 
> On 18/3/3 08:45, piaojun wrote:
>> We should not handle migrate lockres if we are already in
>> 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
>> dlm domain. At last other nodes will get stuck into infinite loop when
>> requsting lock from us.
>>
>> The problem is caused by concurrency umount between nodes. Before
>> receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
>> migrate target. So N2 will continue sending lockres to N1 even though N1
>> has left domain.
>>
>> N1 N2 (owner)
>>touch file
>>
>> access the file,
>> and get pr lock
>>
>>begin leave domain and
>>pick up N1 as new owner
>>
>> begin leave domain and
>> migrate all lockres done
>>
>>begin migrate lockres to N1
>>
>> end leave domain, but
>> the lockres left
>> unexpectedly, because
>> migrate task has passed
>>
>> Signed-off-by: Jun Piao 
>> Reviewed-by: Yiwen Jiang  ---
>>  fs/ocfs2/dlm/dlmdomain.c   | 14 ++
>>  fs/ocfs2/dlm/dlmdomain.h   |  1 +
>>  fs/ocfs2/dlm/dlmrecovery.c |  9 +
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>> index e1fea14..3b7ec51 100644
>> --- a/fs/ocfs2/dlm/dlmdomain.c
>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>> @@ -675,6 +675,20 @@ static void dlm_leave_domain(struct dlm_ctxt *dlm)
>>  spin_unlock(>spinlock);
>>  }
>>
>> +int dlm_joined(struct dlm_ctxt *dlm)
> This helper can be static inline and placed into header.
Agree, and I decide move dlm_shutting_down() into dlmdomain.h together.

thansk,
Jun
> 
>> +{
>> +int ret = 0;
>> +
>> +spin_lock(_domain_lock);
>> +
> Delete blank line here.
> 
>> +if (dlm->dlm_state == DLM_CTXT_JOINED)
>> +ret = 1;
>> +
> Also here.
> 
> Except the above concern, it looks good to me.
> With they are fixed, feel free to add:
> 
> Reviewed-by: Joseph Qi 
> 
>> +spin_unlock(_domain_lock);
>> +
>> +return ret;
>> +}
>> +
>>  int dlm_shutting_down(struct dlm_ctxt *dlm)
>>  {
>>  int ret = 0;
>> diff --git a/fs/ocfs2/dlm/dlmdomain.h b/fs/ocfs2/dlm/dlmdomain.h
>> index fd6122a..2f7f60b 100644
>> --- a/fs/ocfs2/dlm/dlmdomain.h
>> +++ b/fs/ocfs2/dlm/dlmdomain.h
>> @@ -28,6 +28,7 @@
>>  extern spinlock_t dlm_domain_lock;
>>  extern struct list_head dlm_domains;
>>
>> +int dlm_joined(struct dlm_ctxt *dlm);
>>  int dlm_shutting_down(struct dlm_ctxt *dlm);
>>  void dlm_fire_domain_eviction_callbacks(struct dlm_ctxt *dlm,
>>  int node_num);
>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>> index ec8f758..505ab42 100644
>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>> @@ -1378,6 +1378,15 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, 
>> u32 len, void *data,
>>  if (!dlm_grab(dlm))
>>  return -EINVAL;
>>
>> +if (!dlm_joined(dlm)) {
>> +mlog(ML_ERROR, "Domain %s not joined! "
>> +  "lockres %.*s, master %u\n",
>> +  dlm->name, mres->lockname_len,
>> +  mres->lockname, mres->master);
>> +dlm_put(dlm);
>> +return -EINVAL;
>> +}
>> +
>>  BUG_ON(!(mres->flags & (DLM_MRES_RECOVERY|DLM_MRES_MIGRATION)));
>>
>>  real_master = mres->master;
>>
> .
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2/dlm: don't handle migrate lockres if already in shutdown

2018-03-04 Thread Joseph Qi


On 18/3/3 08:45, piaojun wrote:
> We should not handle migrate lockres if we are already in
> 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
> dlm domain. At last other nodes will get stuck into infinite loop when
> requsting lock from us.
> 
> The problem is caused by concurrency umount between nodes. Before
> receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
> migrate target. So N2 will continue sending lockres to N1 even though N1
> has left domain.
> 
> N1 N2 (owner)
>touch file
> 
> access the file,
> and get pr lock
> 
>begin leave domain and
>pick up N1 as new owner
> 
> begin leave domain and
> migrate all lockres done
> 
>begin migrate lockres to N1
> 
> end leave domain, but
> the lockres left
> unexpectedly, because
> migrate task has passed
> 
> Signed-off-by: Jun Piao 
> Reviewed-by: Yiwen Jiang  ---
>  fs/ocfs2/dlm/dlmdomain.c   | 14 ++
>  fs/ocfs2/dlm/dlmdomain.h   |  1 +
>  fs/ocfs2/dlm/dlmrecovery.c |  9 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index e1fea14..3b7ec51 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -675,6 +675,20 @@ static void dlm_leave_domain(struct dlm_ctxt *dlm)
>   spin_unlock(>spinlock);
>  }
> 
> +int dlm_joined(struct dlm_ctxt *dlm)
This helper can be static inline and placed into header.

> +{
> + int ret = 0;
> +
> + spin_lock(_domain_lock);
> +
Delete blank line here.

> + if (dlm->dlm_state == DLM_CTXT_JOINED)
> + ret = 1;
> +
Also here.

Except the above concern, it looks good to me.
With they are fixed, feel free to add:

Reviewed-by: Joseph Qi 

> + spin_unlock(_domain_lock);
> +
> + return ret;
> +}
> +
>  int dlm_shutting_down(struct dlm_ctxt *dlm)
>  {
>   int ret = 0;
> diff --git a/fs/ocfs2/dlm/dlmdomain.h b/fs/ocfs2/dlm/dlmdomain.h
> index fd6122a..2f7f60b 100644
> --- a/fs/ocfs2/dlm/dlmdomain.h
> +++ b/fs/ocfs2/dlm/dlmdomain.h
> @@ -28,6 +28,7 @@
>  extern spinlock_t dlm_domain_lock;
>  extern struct list_head dlm_domains;
> 
> +int dlm_joined(struct dlm_ctxt *dlm);
>  int dlm_shutting_down(struct dlm_ctxt *dlm);
>  void dlm_fire_domain_eviction_callbacks(struct dlm_ctxt *dlm,
>   int node_num);
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index ec8f758..505ab42 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1378,6 +1378,15 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 
> len, void *data,
>   if (!dlm_grab(dlm))
>   return -EINVAL;
> 
> + if (!dlm_joined(dlm)) {
> + mlog(ML_ERROR, "Domain %s not joined! "
> +   "lockres %.*s, master %u\n",
> +   dlm->name, mres->lockname_len,
> +   mres->lockname, mres->master);
> + dlm_put(dlm);
> + return -EINVAL;
> + }
> +
>   BUG_ON(!(mres->flags & (DLM_MRES_RECOVERY|DLM_MRES_MIGRATION)));
> 
>   real_master = mres->master;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel