Following confirms that Thread1 (TCP) is trying to use the same "rec" as
Thread42 (UDP), it is easy to reproduce on the customer system!

 (gdb) thread 42
[Switching to thread 42 (Thread 0x3fffa98fe850 (LWP 99483))]
#0  0x00003fffb33b1df8 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt 5
#0  0x00003fffb33b1df8 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00003fffb33ab178 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00003fffb330df8c in rpc_dplx_unref (rec=0x3ffeccc25d90, flags=0)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/rpc_dplx.c:350
#3  0x00003fffb330226c in clnt_dg_destroy (clnt=0x3ffecc4c4790)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/clnt_dg.c:709
#4  0x00003fffb331c1d4 in __rpcb_findaddr_timed (program=100024, version=1,
nconf=0x3ffeccc21230,
    host=0x102061a8 "localhost", clpp=0x3fffa98fbde8, tp=0x3fffb33603e0
<tottimeout>)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/rpcb_clnt.c:821
(More stack frames follow...)
(gdb) thread 1
[Switching to thread 1 (Thread 0x3fff2a8fe850 (LWP 100755))]
#0  0x00003fffb332ceb0 in makefd_xprt (fd=32039, sendsz=262144,
recvsz=262144, allocated=0x3fff2a8fdb4c)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
436                     if (!(xd->flags & X_VC_DATA_FLAG_SVC_DESTROYED)) {
(gdb) bt 5
#0  0x00003fffb332ceb0 in makefd_xprt (fd=32039, sendsz=262144,
recvsz=262144, allocated=0x3fff2a8fdb4c)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
#1  0x00003fffb332d224 in rendezvous_request (xprt=0x10030fa5b80,
req=0x3fff200008f0)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:549
#2  0x0000000010065104 in thr_decode_rpc_request (context=0x0,
xprt=0x10030fa5b80)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1729
#3  0x00000000100657f4 in thr_decode_rpc_requests (thr_ctx=0x3fff1c0008c0)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1853
#4  0x0000000010195744 in fridgethr_start_routine (arg=0x3fff1c0008c0)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/support/fridgethr.c:561
(More stack frames follow...)
(gdb) frame 0
#0  0x00003fffb332ceb0 in makefd_xprt (fd=32039, sendsz=262144,
recvsz=262144, allocated=0x3fff2a8fdb4c)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
436                     if (!(xd->flags & X_VC_DATA_FLAG_SVC_DESTROYED)) {
(gdb) p rec
$2 = (struct rpc_dplx_rec *) 0x3ffeccc25d90


On Fri, Aug 11, 2017 at 2:05 AM, Matt Benjamin <mbenj...@redhat.com> wrote:

> discussion in #ganesha :)
>
> On Thu, Aug 10, 2017 at 3:55 PM, Malahal Naineni <mala...@gmail.com>
> wrote:
> > Hi All,
> >
> >         One of our customers reported the following backtrace. The
> returned
> > "rec" seems to be corrupted. Based on oflags, rpc_dplx_lookup_rec()
> didn't
> > allocate the "rec" in this call path. Its refcount is 2. More importantly
> > rec.hdl.xd is 0x51 (a bogus pointer) leading to the crash. GDB data is at
> > the end of this email. Note that this crash is observed in latest
> ganesha2.3
> > release.
> >
> > Looking at rpc_dplx_lookup_rec() and rpc_dplx_unref(), looks like rec's
> > refcnt can go to 0 and then back up. Also, rpc_dplx_unref is releasing
> > rec-lock and then acquires hash-lock to preserve the lock order. After
> > dropping the lock at line 359 below, someone else could grab and change
> > refcnt to 1. The second thread could call rpc_dplx_unref() after it is
> done
> > beating the first thread and free the "rec". The first thread accessing
> > "&rec->node_k" at line 361 is in danger as it might be accessing freed
> > memory. In any case, this is NOT our backtrace here. :-(
> >
> > Also, looking at the users of this "rec", they seem to close the file
> > descriptor and then call rpc_dplx_unref(). This has very nasty side
> effects
> > if my understanding is right. Say, thread one has fd 100, it closed it
> and
> > is calling rpc_dplx_unref to free the "rec", but in the mean time another
> > thread gets fd 100, and is calling rpc_dplx_lookup_rec(). At this point
> the
> > second thread is going to use the same "rec" as the first thread,
> correct?
> > Can it happen that a "rec" that belonged to UDP is now being given to a
> > thread doing "TCP"? This is one way I can explain the backtrace! The
> first
> > thread has to be UDP that doesn't need "xd" and the second thread should
> be
> > "TCP" where it finds that the "xd" is uninitialized because the "rec" was
> > allocated by a UDP thread. If you are still reading this email, kudos
> and a
> > big thank you.
> >
> > 357         if (rec->refcnt == 0) {
> > 358                 t = rbtx_partition_of_scalar(&rpc_dplx_rec_set.xt,
> > rec->fd_k);
> > 359                 REC_UNLOCK(rec);
> > 360                 rwlock_wrlock(&t->lock);
> > 361                 nv = opr_rbtree_lookup(&t->t, &rec->node_k);
> > 362                 rec = NULL;
> >
> >
> > BORING GDB STUFF:
> >
> > (gdb) bt
> > #0  0x00003fff7aaaceb0 in makefd_xprt (fd=166878, sendsz=262144,
> > recvsz=262144, allocated=0x3ffab97fdb4c)
> >     at
> > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/
> libntirpc/src/svc_vc.c:436
> > #1  0x00003fff7aaad224 in rendezvous_request (xprt=0x1000b125310,
> > req=0x3ffa2c0008f0)
> >     at
> > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/
> libntirpc/src/svc_vc.c:549
> > #2  0x0000000010065104 in thr_decode_rpc_request (context=0x0,
> > xprt=0x1000b125310)
> >     at
> > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/
> MainNFSD/nfs_rpc_dispatcher_thread.c:1729
> > #3  0x00000000100657f4 in thr_decode_rpc_requests
> (thr_ctx=0x3ffedc001280)
> >     at
> > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/
> MainNFSD/nfs_rpc_dispatcher_thread.c:1853
> > #4  0x0000000010195744 in fridgethr_start_routine (arg=0x3ffedc001280)
> >     at
> > /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/
> support/fridgethr.c:561
> >
> > (gdb) p oflags
> > $1 = 0
> > (gdb) p rec->hdl.xd
> > $2 = (struct x_vc_data *) 0x51
> > (gdb) p *rec
> > $3 = {fd_k = 166878, locktrace = {mtx = {__data = {__lock = 2, __count =
> 0,
> > __owner = 92274, __nusers = 1, __kind = 3,
> >         __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
> >       __size =
> > "\002\000\000\000\000\000\000\000rh\001\000\001\000\000\000\003", '\000'
> > <repeats 22 times>,
> >       __align = 2}, func = 0x3fff7aac6ca0 <__func__.8774> "rpc_dplx_ref",
> > line = 89}, node_k = {left = 0x0,
> >     right = 0x0, parent = 0x3ff9c80034f0, red = 1, gen = 639163}, refcnt
> =
> > 2, send = {lock = {we = {mtx = {__data = {
> >             __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind =
> 3,
> > __spins = 0, __list = {__prev = 0x0,
> >               __next = 0x0}}, __size = '\000' <repeats 16 times>, "\003",
> > '\000' <repeats 22 times>, __align = 0},
> >         cv = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
> > __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
> >             __nwaiters = 0, __broadcast_seq = 0}, __size = '\000'
> <repeats
> > 47 times>, __align = 0}},
> >       lock_flag_value = 0, locktrace = {func = 0x0, line = 0}}}, recv =
> > {lock = {we = {mtx = {__data = {__lock = 0,
> >             __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins
> = 0,
> > __list = {__prev = 0x0, __next = 0x0}},
> >           __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22
> > times>, __align = 0}, cv = {__data = {
> >             __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
> > __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
> >             __broadcast_seq = 0}, __size = '\000' <repeats 47 times>,
> > __align = 0}}, lock_flag_value = 0, locktrace = {
> >         func = 0x3ffc000000d8 "\300L\001", line = 0}}}, hdl = {xd = 0x51,
> > xprt = 0x0}}
> > (gdb)
> >
> >
> > ------------------------------------------------------------
> ------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to