Hi All,

        One of our customers reported the following backtrace. The returned
"rec" seems to be corrupted. Based on oflags, rpc_dplx_lookup_rec() didn't
allocate the "rec" in this call path. Its refcount is 2. More importantly
rec.hdl.xd is 0x51 (a bogus pointer) leading to the crash. GDB data is at
the end of this email. Note that this crash is observed in latest ganesha2.3
release.

Looking at rpc_dplx_lookup_rec() and rpc_dplx_unref(), looks like rec's
refcnt can go to 0 and then back up. Also, rpc_dplx_unref is releasing
rec-lock and then acquires hash-lock to preserve the lock order. After
dropping the lock at line 359 below, someone else could grab and change
refcnt to 1. The second thread could call rpc_dplx_unref() after it is done
beating the first thread and free the "rec". The first thread accessing
"&rec->node_k" at line 361 is in danger as it might be accessing freed
memory. In any case, this is NOT our backtrace here. :-(

Also, looking at the users of this "rec", they seem to close the file
descriptor and then call rpc_dplx_unref(). This has very nasty side effects
if my understanding is right. Say, thread one has fd 100, it closed it and
is calling rpc_dplx_unref to free the "rec", but in the mean time another
thread gets fd 100, and is calling rpc_dplx_lookup_rec(). At this point the
second thread is going to use the same "rec" as the first thread, correct?
Can it happen that a "rec" that belonged to UDP is now being given to a
thread doing "TCP"? This is one way I can explain the backtrace! The first
thread has to be UDP that doesn't need "xd" and the second thread should be
"TCP" where it finds that the "xd" is uninitialized because the "rec" was
allocated by a UDP thread. If you are still reading this email, kudos and a
big thank you.

357         if (rec->refcnt == 0) {
358                 t = rbtx_partition_of_scalar(&rpc_dplx_rec_set.xt,
rec->fd_k);
359                 REC_UNLOCK(rec);
360                 rwlock_wrlock(&t->lock);
361                 nv = opr_rbtree_lookup(&t->t, &rec->node_k);
362                 rec = NULL;


BORING GDB STUFF:

(gdb) bt
#0  0x00003fff7aaaceb0 in makefd_xprt (fd=166878, sendsz=262144,
recvsz=262144, allocated=0x3ffab97fdb4c)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
#1  0x00003fff7aaad224 in rendezvous_request (xprt=0x1000b125310,
req=0x3ffa2c0008f0)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:549
#2  0x0000000010065104 in thr_decode_rpc_request (context=0x0,
xprt=0x1000b125310)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1729
#3  0x00000000100657f4 in thr_decode_rpc_requests (thr_ctx=0x3ffedc001280)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1853
#4  0x0000000010195744 in fridgethr_start_routine (arg=0x3ffedc001280)
    at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/support/fridgethr.c:561

(gdb) p oflags
$1 = 0
(gdb) p rec->hdl.xd
$2 = (struct x_vc_data *) 0x51
(gdb) p *rec
$3 = {fd_k = 166878, locktrace = {mtx = {__data = {__lock = 2, __count = 0,
__owner = 92274, __nusers = 1, __kind = 3,
        __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
      __size =
"\002\000\000\000\000\000\000\000rh\001\000\001\000\000\000\003", '\000'
<repeats 22 times>,
      __align = 2}, func = 0x3fff7aac6ca0 <__func__.8774> "rpc_dplx_ref",
line = 89}, node_k = {left = 0x0,
    right = 0x0, parent = 0x3ff9c80034f0, red = 1, gen = 639163}, refcnt =
2, send = {lock = {we = {mtx = {__data = {
            __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3,
__spins = 0, __list = {__prev = 0x0,
              __next = 0x0}}, __size = '\000' <repeats 16 times>, "\003",
'\000' <repeats 22 times>, __align = 0},
        cv = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
__wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
            __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats
47 times>, __align = 0}},
      lock_flag_value = 0, locktrace = {func = 0x0, line = 0}}}, recv =
{lock = {we = {mtx = {__data = {__lock = 0,
            __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins =
0, __list = {__prev = 0x0, __next = 0x0}},
          __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22
times>, __align = 0}, cv = {__data = {
            __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
            __broadcast_seq = 0}, __size = '\000' <repeats 47 times>,
__align = 0}}, lock_flag_value = 0, locktrace = {
        func = 0x3ffc000000d8 "\300L\001", line = 0}}}, hdl = {xd = 0x51,
xprt = 0x0}}
(gdb)
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to