discussion in #ganesha :)

On Thu, Aug 10, 2017 at 3:55 PM, Malahal Naineni <mala...@gmail.com> wrote:
> Hi All,
>
>         One of our customers reported the following backtrace. The returned
> "rec" seems to be corrupted. Based on oflags, rpc_dplx_lookup_rec() didn't
> allocate the "rec" in this call path. Its refcount is 2. More importantly
> rec.hdl.xd is 0x51 (a bogus pointer) leading to the crash. GDB data is at
> the end of this email. Note that this crash is observed in latest ganesha2.3
> release.
>
> Looking at rpc_dplx_lookup_rec() and rpc_dplx_unref(), looks like rec's
> refcnt can go to 0 and then back up. Also, rpc_dplx_unref is releasing
> rec-lock and then acquires hash-lock to preserve the lock order. After
> dropping the lock at line 359 below, someone else could grab and change
> refcnt to 1. The second thread could call rpc_dplx_unref() after it is done
> beating the first thread and free the "rec". The first thread accessing
> "&rec->node_k" at line 361 is in danger as it might be accessing freed
> memory. In any case, this is NOT our backtrace here. :-(
>
> Also, looking at the users of this "rec", they seem to close the file
> descriptor and then call rpc_dplx_unref(). This has very nasty side effects
> if my understanding is right. Say, thread one has fd 100, it closed it and
> is calling rpc_dplx_unref to free the "rec", but in the mean time another
> thread gets fd 100, and is calling rpc_dplx_lookup_rec(). At this point the
> second thread is going to use the same "rec" as the first thread, correct?
> Can it happen that a "rec" that belonged to UDP is now being given to a
> thread doing "TCP"? This is one way I can explain the backtrace! The first
> thread has to be UDP that doesn't need "xd" and the second thread should be
> "TCP" where it finds that the "xd" is uninitialized because the "rec" was
> allocated by a UDP thread. If you are still reading this email, kudos and a
> big thank you.
>
> 357         if (rec->refcnt == 0) {
> 358                 t = rbtx_partition_of_scalar(&rpc_dplx_rec_set.xt,
> rec->fd_k);
> 359                 REC_UNLOCK(rec);
> 360                 rwlock_wrlock(&t->lock);
> 361                 nv = opr_rbtree_lookup(&t->t, &rec->node_k);
> 362                 rec = NULL;
>
>
> BORING GDB STUFF:
>
> (gdb) bt
> #0  0x00003fff7aaaceb0 in makefd_xprt (fd=166878, sendsz=262144,
> recvsz=262144, allocated=0x3ffab97fdb4c)
>     at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
> #1  0x00003fff7aaad224 in rendezvous_request (xprt=0x1000b125310,
> req=0x3ffa2c0008f0)
>     at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:549
> #2  0x0000000010065104 in thr_decode_rpc_request (context=0x0,
> xprt=0x1000b125310)
>     at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1729
> #3  0x00000000100657f4 in thr_decode_rpc_requests (thr_ctx=0x3ffedc001280)
>     at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1853
> #4  0x0000000010195744 in fridgethr_start_routine (arg=0x3ffedc001280)
>     at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/support/fridgethr.c:561
>
> (gdb) p oflags
> $1 = 0
> (gdb) p rec->hdl.xd
> $2 = (struct x_vc_data *) 0x51
> (gdb) p *rec
> $3 = {fd_k = 166878, locktrace = {mtx = {__data = {__lock = 2, __count = 0,
> __owner = 92274, __nusers = 1, __kind = 3,
>         __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>       __size =
> "\002\000\000\000\000\000\000\000rh\001\000\001\000\000\000\003", '\000'
> <repeats 22 times>,
>       __align = 2}, func = 0x3fff7aac6ca0 <__func__.8774> "rpc_dplx_ref",
> line = 89}, node_k = {left = 0x0,
>     right = 0x0, parent = 0x3ff9c80034f0, red = 1, gen = 639163}, refcnt =
> 2, send = {lock = {we = {mtx = {__data = {
>             __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3,
> __spins = 0, __list = {__prev = 0x0,
>               __next = 0x0}}, __size = '\000' <repeats 16 times>, "\003",
> '\000' <repeats 22 times>, __align = 0},
>         cv = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
> __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
>             __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats
> 47 times>, __align = 0}},
>       lock_flag_value = 0, locktrace = {func = 0x0, line = 0}}}, recv =
> {lock = {we = {mtx = {__data = {__lock = 0,
>             __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins = 0,
> __list = {__prev = 0x0, __next = 0x0}},
>           __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22
> times>, __align = 0}, cv = {__data = {
>             __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
> __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
>             __broadcast_seq = 0}, __size = '\000' <repeats 47 times>,
> __align = 0}}, lock_flag_value = 0, locktrace = {
>         func = 0x3ffc000000d8 "\300L\001", line = 0}}}, hdl = {xd = 0x51,
> xprt = 0x0}}
> (gdb)
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to