Re: [Ocfs2-devel] What's the need of OCFS2_INODE_MAYBE_ORPHANED?

2014-01-13 Thread Joel Becker
On Thu, Jan 09, 2014 at 07:35:05AM -0600, Goldwyn Rodrigues wrote:
 On 01/09/2014 04:23 AM, Joel Becker wrote:
 Unlink can happen from anywhere, but only the last closing node can
 actually remove the file.  MAYBE_ORPHANED tells the node to try for
 removal at close time.  It is absolutely necessary.
 
 
 The reason I asked the query is that OCFS2_INODE_MAYBE_ORPHANED is
 being set at every dentry downconvert. Is this really necessary
 because every dentry downconvert does not turn into unlink? (I know
 it says maybe :/ )
 
 Is it okay to set it when the open_lock fails or is it too late in
 the process? If another node has performed an unlink, it would need
 to get the open lock before it performs the inode wipe. So we should
 be safe that way? Is there anything incorrect in this design?

It's not safe.  Srini has already answered this on the other part of the
thread.  I'll address your other comments there.

Joel

 
 
 -- 
 Goldwyn

-- 

You look in her eyes, the music begins to play.
 Hopeless romantics, here we go again.

http://www.jlbec.org/
jl...@evilplan.org

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] What's the need of OCFS2_INODE_MAYBE_ORPHANED?

2014-01-13 Thread Joel Becker
On Thu, Jan 09, 2014 at 11:27:15AM -0600, Goldwyn Rodrigues wrote:
  Yes, I did not consider that.
  How about using open locks ro_holders count to identify this? That may
  just work. Thanks!
  One problem I see in using open lock for this is it could be late.
  Consider the scenario where node A removes the dentry and then the node
  crashes before trying the try_open_lock. Node B does the file close
  later but it doesn't know that the file was unlinked and doesn't do the
  clean up.
 
  To me it appears OCFS2_INODE_MAYBE_ORPHANED is necessary. Any delay it
  is causing must be addressed differently.
 
 No, I don't mean to remove the OCFS2_INODE_MAYBE_ORPHANED flag, but set 
 it conditionally in ocfs2_dentry_convert_worker() based on the value of 
 the open locks held.

I'm confused by what you are attempting here.  We hold the dentry lock
until the final dpu() (see the comment in fs/ocfs2/dcache.c).  We should
never have ro_holders==0 unless we're flushing the entry from the
dcache.  Do you mean something else?

Joel

-- 

Now Someone's on the telephone, desperate in his pain.
 Someone's on the bathroom floor doing her cocaine.
 Someone's got his finger on the button in some room.
 No one can convince me we aren't gluttons for our doom.

http://www.jlbec.org/
jl...@evilplan.org

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/1] o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

2014-01-13 Thread Joel Becker
On Fri, Jan 10, 2014 at 05:19:13PM -0800, Srinivas Eeda wrote:
 From: Srinivas Eeda seeda@srini.(none)
 
 A tiny race between BAST and unlock message causes the NULL dereference.
 
 A node sends an unlock request to master and receives a response. Before
 processing the response it receives a BAST from the master. Since both 
 requests
 are processed by different threads it creates a race. While the BAST is being
 processed, lock can get freed by unlock code.
 
 This patch makes bast to return immediately if lock is found but unlock is
 pending. The code should handle this race. We also have to fix master node to
 skip sending BAST after receiving unlock message.

Did the master send the BAST after the unlock, or does that race too?
Does the master know the unlock has succeeded, or does it just think so?

 @@ -385,8 +385,13 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 
 len, void *data,
   head = res-granted;
  
   list_for_each_entry(lock, head, list) {
 - if (lock-ml.cookie == cookie)
 - goto do_ast;
 + /* if lock is found but unlock is pending ignore the bast */
 + if (lock-ml.cookie == cookie) {
 + if (lock-unlock_pending)
 + break;
 + else
 + goto do_ast;
 + }

This breaks out for asts as well as basts.  Can't that cause problems
with the unlock ast expected by the caller?

Joel


-- 

Not being known doesn't stop the truth from being true.
- Richard Bach

http://www.jlbec.org/
jl...@evilplan.org

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/1] o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

2014-01-13 Thread Joseph Qi
On 2014/1/11 9:19, Srinivas Eeda wrote:
 From: Srinivas Eeda seeda@srini.(none)
 
 A tiny race between BAST and unlock message causes the NULL dereference.
 
 A node sends an unlock request to master and receives a response. Before
 processing the response it receives a BAST from the master. Since both 
 requests
 are processed by different threads it creates a race. While the BAST is being
 processed, lock can get freed by unlock code.
 
 This patch makes bast to return immediately if lock is found but unlock is
 pending. The code should handle this race. We also have to fix master node to
 skip sending BAST after receiving unlock message.
 
 Below is the crash stack
 
 BUG: unable to handle kernel NULL pointer dereference at 0048
 IP: [a015e023] o2dlm_blocking_ast_wrapper+0xd/0x16
 [a034e3db] dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
 [a034f366] dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
 [a0308abe] o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
 [a030aac8] o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
 [81071802] worker_thread+0x14d/0x1ed
 
 Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
 ---
  fs/ocfs2/dlm/dlmast.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/fs/ocfs2/dlm/dlmast.c b/fs/ocfs2/dlm/dlmast.c
 index b46278f..dbc6cee 100644
 --- a/fs/ocfs2/dlm/dlmast.c
 +++ b/fs/ocfs2/dlm/dlmast.c
 @@ -385,8 +385,13 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 
 len, void *data,
   head = res-granted;
  
   list_for_each_entry(lock, head, list) {
 - if (lock-ml.cookie == cookie)
 - goto do_ast;
 + /* if lock is found but unlock is pending ignore the bast */
 + if (lock-ml.cookie == cookie) {
 + if (lock-unlock_pending)
 + break;
 + else
 + goto do_ast;
 + }
   }
  
   mlog(0, Got %sast for unknown lock! cookie=%u:%llu, name=%.*s, 
 
I found you sent a version on Jan 30, 2012.
https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008469.html
Compared with the old version, this version only saves a little bit CPU,
am I right?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/1] o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

2014-01-13 Thread Srinivas Eeda
On 01/13/2014 08:06 PM, Joseph Qi wrote:
 On 2014/1/11 9:19, Srinivas Eeda wrote:
 From: Srinivas Eeda seeda@srini.(none)

 A tiny race between BAST and unlock message causes the NULL dereference.

 A node sends an unlock request to master and receives a response. Before
 processing the response it receives a BAST from the master. Since both 
 requests
 are processed by different threads it creates a race. While the BAST is being
 processed, lock can get freed by unlock code.

 This patch makes bast to return immediately if lock is found but unlock is
 pending. The code should handle this race. We also have to fix master node to
 skip sending BAST after receiving unlock message.

 Below is the crash stack

 BUG: unable to handle kernel NULL pointer dereference at 0048
 IP: [a015e023] o2dlm_blocking_ast_wrapper+0xd/0x16
 [a034e3db] dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
 [a034f366] dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
 [a0308abe] o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
 [a030aac8] o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
 [81071802] worker_thread+0x14d/0x1ed

 Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
 ---
   fs/ocfs2/dlm/dlmast.c |9 +++--
   1 file changed, 7 insertions(+), 2 deletions(-)

 diff --git a/fs/ocfs2/dlm/dlmast.c b/fs/ocfs2/dlm/dlmast.c
 index b46278f..dbc6cee 100644
 --- a/fs/ocfs2/dlm/dlmast.c
 +++ b/fs/ocfs2/dlm/dlmast.c
 @@ -385,8 +385,13 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 
 len, void *data,
  head = res-granted;
   
  list_for_each_entry(lock, head, list) {
 -if (lock-ml.cookie == cookie)
 -goto do_ast;
 +/* if lock is found but unlock is pending ignore the bast */
 +if (lock-ml.cookie == cookie) {
 +if (lock-unlock_pending)
 +break;
 +else
 +goto do_ast;
 +}
  }
   
  mlog(0, Got %sast for unknown lock! cookie=%u:%llu, name=%.*s, 

 I found you sent a version on Jan 30, 2012.
 https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008469.html
 Compared with the old version, this version only saves a little bit CPU,
 am I right?
Yes you are right. I made the change as Goldwyn suggested which is a 
good thing to have :)

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/1] o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

2014-01-13 Thread Srinivas Eeda
On 01/13/2014 07:37 AM, Joel Becker wrote:
 On Fri, Jan 10, 2014 at 05:19:13PM -0800, Srinivas Eeda wrote:
 From: Srinivas Eeda seeda@srini.(none)

 A tiny race between BAST and unlock message causes the NULL dereference.

 A node sends an unlock request to master and receives a response. Before
 processing the response it receives a BAST from the master. Since both 
 requests
 are processed by different threads it creates a race. While the BAST is being
 processed, lock can get freed by unlock code.

 This patch makes bast to return immediately if lock is found but unlock is
 pending. The code should handle this race. We also have to fix master node to
 skip sending BAST after receiving unlock message.
 Did the master send the BAST after the unlock, or does that race too?
 Does the master know the unlock has succeeded, or does it just think so?
I think it's due to a race but I haven't debugged the master. My guess 
is unlock request sneaked in before the dlm_flush_asts was called. 
However non master node should handle this race as well, so just did 
that part which fixed a bug we were seeing.



 @@ -385,8 +385,13 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 
 len, void *data,
  head = res-granted;
   
  list_for_each_entry(lock, head, list) {
 -if (lock-ml.cookie == cookie)
 -goto do_ast;
 +/* if lock is found but unlock is pending ignore the bast */
 +if (lock-ml.cookie == cookie) {
 +if (lock-unlock_pending)
 +break;
 +else
 +goto do_ast;
 +}
 This breaks out for asts as well as basts.  Can't that cause problems
 with the unlock ast expected by the caller?
if unlock_pending is set, then the node is trying to unlock an existing 
lock and shouldn't receive any asts ?

 Joel




___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel