On 8/31/17 9:14 AM, Daniel Gryniewicz wrote:
On 08/30/2017 10:06 PM, Pradeep wrote:
Hi all,

I'm hitting a crash in TIRPC with Ganesha 2.6-dev.5. It appears to me that 
there is a race between a incoming RPC message on a new xprt (for which 
accept() was done on the FD) and TIRPC setting the process_cb on the new xprt.

We set the xprt->xp_dispatch.process_cb() from the rendezvous function (nfs_rpc_dispatch_tcp_NFS in case of NFS/TCP). This is called at the end of svc_vc_rendezvous(). But before this happens an RPC request could be invoking svc_vc_recv() because we have already called accept(). Shouldn't we setup xprt before accept()?

Not the accept itself, but adding the accepted fd to epoll, which is also 
happening before the rendezvous.  I think the call to svc_rqst_xprt_register() 
needs to be last, or a lock needs to be taken.

Bill?

Yes, that's a problem.  I checked v2.5 (ntirpc 1.5) and that has the
same issue.  It's registering the epoll before doing other essential
things, like setting up the recvsize and sendsize, and calling (old)
xp_recv_user_data (now named nfs_rpc_dispatch_tcp_NFS).

My guess is you're seeing it because the 2.6 epoll loop is much faster.
We're expecting to find more of these timing and code ordering errors.

But it looks like a relatively easy fix.

Thanks for the excellent detailed report.  So helpful!

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to