will do, thanks for posting Matt
On Mon, Aug 14, 2017 at 4:32 PM, Malahal Naineni <mala...@gmail.com> wrote: > Hi Matt and Bill, we were able to reproduce this crash very easily with a > sleep after closing "fd" . After my fix, things worked fine. The changes are > a lot but mostly trivial. Appreciate any high level review. > > ganesha changes (last but one commit at > https://github.com/ganltc/nfs-ganesha/commits/ibm2.3). > > Corresponding ntirpc commit (last commit) > https://github.com/ganltc/ntirpc/commits/ibm2.3 > > On Mon, Aug 14, 2017 at 5:02 PM, Malahal Naineni <mala...@gmail.com> wrote: >> >> Unfortunately, I need a fix for this issue against ganesha2.3. >> >> Regards, Malahal. >> >> On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson >> <william.allen.simp...@gmail.com> wrote: >>> >>> On 8/13/17 11:50 PM, Malahal Naineni wrote: >>>> >>>> >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing >>>> UDP. It's a mess, and has been a mess for a long time. >>>> >>>> We get a file descriptor fd and then create "rec", but while destroying >>>> things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in >>>> clnt_dg_destroy() (and other places) might help fix this issue, but I am >>>> not >>>> positive as I am not familiar with this code. >>>> >>>> I am also working on a blind replacement of "fd" by "struct gfd" where >>>> struct gfd has the "fd" as well as a "generation number". The generation >>>> number is incremented when ever such "fd" is created (e.g. accept() call or >>>> socket() call). The changes are many but they are trivial. >>>> >>>> Any thoughts? >>>> >>> It's not really interesting for the current code base. In V2.5, I've >>> already eliminated all the various copies of fd, and every SVCXPRT is >>> wrapped inside a dplx_rec, and they all use xp_fd, and it's in only one >>> tree (svc_rqst). So there's no longer any possibility of multiple >>> generations of fd. >>> >>> That said, the last remaining problem is clnt_dg clnt_call, where the >>> fd can be passed to poll() at the same time as another copy is passed to >>> (or being removed from) epoll(). Requires a complete re-write. >>> >>> I'd started doing the re-write long long ago, even made the rpc_ctx >>> transport independent (committed in V2.6/v1.6 Napalm rendezvous patch). >>> But there are still many problems redesigning with async callbacks. >>> >>> I'm looking at the short-term fix I've mentioned earlier, that we should >>> try TCP before UDP, but given our current code base doesn't even compile, >>> I've given up until next week. >> >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel