[Nfs-ganesha-devel] destroying timing window

William Allen Simpson Sat, 29 Jul 2017 05:48:42 -0700

Thought I'd share my findings and proposed solution with the whole group.


On 7/28/17 7:37 AM, Dominique Martinet wrote:

I also get a random crash with ASAN during init, one out of 3-4 times

Here's the stack:
#0  0x0000000000507470 in __sanitizer::Die() ()
#1  0x00000000004ea1be in __asan::ReportDeadlySignal(char const*, 
__sanitizer::SignalContext const&) ()
#2  0x00000000004e6492 in __asan::AsanOnDeadlySignal(int, void*, void*) ()
#3  <signal handler called>
#4  0x00007ffff58f8edd in svc_override_ops (ops=0x7ffff5b20460 
<svc_dg_override_ops.ops>, rendezvous=0x619000073a80)
     at /export/nfs-ganesha/src/libntirpc/src/svc_internal.h:204
#5  0x00007ffff58f871b in svc_dg_override_ops (xprt=0x6250000de900, 
rendezvous=0x619000073a80)
     at /export/nfs-ganesha/src/libntirpc/src/svc_dg.c:495
#6  0x00007ffff58f8126 in svc_dg_rendezvous (xprt=0x619000073a80) at 
/export/nfs-ganesha/src/libntirpc/src/svc_dg.c:246
[...]

The problem is rendezvous->xp_ops is not valid at this point, digging a bit it 
turns out rendezvous itself has been freed:
[...]
     #6 0x7ffff58dfc7e in clnt_dg_destroy 
/export/nfs-ganesha/src/libntirpc/src/clnt_dg.c:665:2


I'm not able to reproduce.  But Matt and I agree that you've found a
race condition that has been there for a long time.  It affects both
UDP and TCP (and probably RDMA).

The window is between the return from epoll_wait() and the SVC_REF()
for the task that will handle the signal.  A different thread could
decrement the counter during that window, find it is transiently zero,
and destroy the transport.

For transports (SVCXPRT), we only reference 1 count for all the lists
and trees where the xprt is initially referenced.  There are 3 for TCP
(was 4 before Ganesha V2.5/ntirpc 1.5 where I've eliminated a
duplicated tree of fds), and 2 for UDP.

We don't really want to change that, as each kind of transport can
have different lists and trees.

There's an easy answer: don't store the xprt, use the fd, and then
lookup the fd to see whether it's still in the system.  Heck, that's
the epoll example code, and they have fd in the event union.

That approach has the advantage of being backward API compatible.

Matt has rejected that approach as too slow: O(1) in cache, O(log2(n))
otherwise, instead of existing O(0) direct pointer.

I've determined that the only other way to handle this window is to
ensure that only epoll tasks finally xp_destroy the transport, instead
of SVC_RELEASE().  SVC_RELEASE() already sets FLAG_DESTROYING to
enforce once-only destroying semantics.  We can use that flag in an
epoll cleanup routine.

There is already a TCP-only routine -- svc_clean_idle2() -- that can be
re-purposed to handle UDP and RDMA as well.

That routine is only called by epoll loop once every 1024 events.  But
the transports are relatively small, and we can afford the memory.

Because it will be an API change to SVC_RELEASE() -- a macro and inline
function -- we cannot backport.  Obviously, we haven't seen this very
often.  But it will be a good reason for moving downstreams to V2.6.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

[Nfs-ganesha-devel] destroying timing window

Reply via email to