Thought I'd share my findings and proposed solution with the whole group.
On 7/28/17 7:37 AM, Dominique Martinet wrote:
I also get a random crash with ASAN during init, one out of 3-4 times Here's the stack: #0 0x0000000000507470 in __sanitizer::Die() () #1 0x00000000004ea1be in __asan::ReportDeadlySignal(char const*, __sanitizer::SignalContext const&) () #2 0x00000000004e6492 in __asan::AsanOnDeadlySignal(int, void*, void*) () #3 <signal handler called> #4 0x00007ffff58f8edd in svc_override_ops (ops=0x7ffff5b20460 <svc_dg_override_ops.ops>, rendezvous=0x619000073a80) at /export/nfs-ganesha/src/libntirpc/src/svc_internal.h:204 #5 0x00007ffff58f871b in svc_dg_override_ops (xprt=0x6250000de900, rendezvous=0x619000073a80) at /export/nfs-ganesha/src/libntirpc/src/svc_dg.c:495 #6 0x00007ffff58f8126 in svc_dg_rendezvous (xprt=0x619000073a80) at /export/nfs-ganesha/src/libntirpc/src/svc_dg.c:246 [...] The problem is rendezvous->xp_ops is not valid at this point, digging a bit it turns out rendezvous itself has been freed: [...] #6 0x7ffff58dfc7e in clnt_dg_destroy /export/nfs-ganesha/src/libntirpc/src/clnt_dg.c:665:2
I'm not able to reproduce. But Matt and I agree that you've found a race condition that has been there for a long time. It affects both UDP and TCP (and probably RDMA). The window is between the return from epoll_wait() and the SVC_REF() for the task that will handle the signal. A different thread could decrement the counter during that window, find it is transiently zero, and destroy the transport. For transports (SVCXPRT), we only reference 1 count for all the lists and trees where the xprt is initially referenced. There are 3 for TCP (was 4 before Ganesha V2.5/ntirpc 1.5 where I've eliminated a duplicated tree of fds), and 2 for UDP. We don't really want to change that, as each kind of transport can have different lists and trees. There's an easy answer: don't store the xprt, use the fd, and then lookup the fd to see whether it's still in the system. Heck, that's the epoll example code, and they have fd in the event union. That approach has the advantage of being backward API compatible. Matt has rejected that approach as too slow: O(1) in cache, O(log2(n)) otherwise, instead of existing O(0) direct pointer. I've determined that the only other way to handle this window is to ensure that only epoll tasks finally xp_destroy the transport, instead of SVC_RELEASE(). SVC_RELEASE() already sets FLAG_DESTROYING to enforce once-only destroying semantics. We can use that flag in an epoll cleanup routine. There is already a TCP-only routine -- svc_clean_idle2() -- that can be re-purposed to handle UDP and RDMA as well. That routine is only called by epoll loop once every 1024 events. But the transports are relatively small, and we can afford the memory. Because it will be an API change to SVC_RELEASE() -- a macro and inline function -- we cannot backport. Obviously, we haven't seen this very often. But it will be a good reason for moving downstreams to V2.6. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel