discussion in #ganesha :) On Thu, Aug 10, 2017 at 3:55 PM, Malahal Naineni <mala...@gmail.com> wrote: > Hi All, > > One of our customers reported the following backtrace. The returned > "rec" seems to be corrupted. Based on oflags, rpc_dplx_lookup_rec() didn't > allocate the "rec" in this call path. Its refcount is 2. More importantly > rec.hdl.xd is 0x51 (a bogus pointer) leading to the crash. GDB data is at > the end of this email. Note that this crash is observed in latest ganesha2.3 > release. > > Looking at rpc_dplx_lookup_rec() and rpc_dplx_unref(), looks like rec's > refcnt can go to 0 and then back up. Also, rpc_dplx_unref is releasing > rec-lock and then acquires hash-lock to preserve the lock order. After > dropping the lock at line 359 below, someone else could grab and change > refcnt to 1. The second thread could call rpc_dplx_unref() after it is done > beating the first thread and free the "rec". The first thread accessing > "&rec->node_k" at line 361 is in danger as it might be accessing freed > memory. In any case, this is NOT our backtrace here. :-( > > Also, looking at the users of this "rec", they seem to close the file > descriptor and then call rpc_dplx_unref(). This has very nasty side effects > if my understanding is right. Say, thread one has fd 100, it closed it and > is calling rpc_dplx_unref to free the "rec", but in the mean time another > thread gets fd 100, and is calling rpc_dplx_lookup_rec(). At this point the > second thread is going to use the same "rec" as the first thread, correct? > Can it happen that a "rec" that belonged to UDP is now being given to a > thread doing "TCP"? This is one way I can explain the backtrace! The first > thread has to be UDP that doesn't need "xd" and the second thread should be > "TCP" where it finds that the "xd" is uninitialized because the "rec" was > allocated by a UDP thread. If you are still reading this email, kudos and a > big thank you. > > 357 if (rec->refcnt == 0) { > 358 t = rbtx_partition_of_scalar(&rpc_dplx_rec_set.xt, > rec->fd_k); > 359 REC_UNLOCK(rec); > 360 rwlock_wrlock(&t->lock); > 361 nv = opr_rbtree_lookup(&t->t, &rec->node_k); > 362 rec = NULL; > > > BORING GDB STUFF: > > (gdb) bt > #0 0x00003fff7aaaceb0 in makefd_xprt (fd=166878, sendsz=262144, > recvsz=262144, allocated=0x3ffab97fdb4c) > at > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436 > #1 0x00003fff7aaad224 in rendezvous_request (xprt=0x1000b125310, > req=0x3ffa2c0008f0) > at > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:549 > #2 0x0000000010065104 in thr_decode_rpc_request (context=0x0, > xprt=0x1000b125310) > at > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1729 > #3 0x00000000100657f4 in thr_decode_rpc_requests (thr_ctx=0x3ffedc001280) > at > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1853 > #4 0x0000000010195744 in fridgethr_start_routine (arg=0x3ffedc001280) > at > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/support/fridgethr.c:561 > > (gdb) p oflags > $1 = 0 > (gdb) p rec->hdl.xd > $2 = (struct x_vc_data *) 0x51 > (gdb) p *rec > $3 = {fd_k = 166878, locktrace = {mtx = {__data = {__lock = 2, __count = 0, > __owner = 92274, __nusers = 1, __kind = 3, > __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, > __size = > "\002\000\000\000\000\000\000\000rh\001\000\001\000\000\000\003", '\000' > <repeats 22 times>, > __align = 2}, func = 0x3fff7aac6ca0 <__func__.8774> "rpc_dplx_ref", > line = 89}, node_k = {left = 0x0, > right = 0x0, parent = 0x3ff9c80034f0, red = 1, gen = 639163}, refcnt = > 2, send = {lock = {we = {mtx = {__data = { > __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3, > __spins = 0, __list = {__prev = 0x0, > __next = 0x0}}, __size = '\000' <repeats 16 times>, "\003", > '\000' <repeats 22 times>, __align = 0}, > cv = {__data = {__lock = 0, __futex = 0, __total_seq = 0, > __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, > __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats > 47 times>, __align = 0}}, > lock_flag_value = 0, locktrace = {func = 0x0, line = 0}}}, recv = > {lock = {we = {mtx = {__data = {__lock = 0, > __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins = 0, > __list = {__prev = 0x0, __next = 0x0}}, > __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 > times>, __align = 0}, cv = {__data = { > __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, > __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, > __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, > __align = 0}}, lock_flag_value = 0, locktrace = { > func = 0x3ffc000000d8 "\300L\001", line = 0}}}, hdl = {xd = 0x51, > xprt = 0x0}} > (gdb) > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel >
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel