I think we need to ensure that the partition lock is taken before the qlane lock. I have a patch for this, but it introduced a refcount issue, so I'm debugging.

Daniel

On 08/03/2017 08:52 PM, Pradeep wrote:
Thanks Franks. I merged your patch and now hitting another deadlock. Here are the two threads:

This thread below holds the partition lock in 'read' mode and try to acquire queue lock:

Thread 143 (Thread 0x7faf82f72700 (LWP 143573)):
#0  0x00007fafd1c371bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fafd1c32d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fafd1c32c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000005221fd in _mdcache_lru_ref (entry=0x7fae78d19000, flags=2, func=0x58ec80 <__func__.23467> "mdcache_find_keyed", line=881) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1813 #4 0x0000000000532686 in mdcache_find_keyed (key=0x7faf82f70760, entry=0x7faf82f707e8) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:881

     874         *entry = cih_get_by_key_latch(key, &latch,
875 CIH_GET_RLOCK | CIH_GET_UNLOCK_ON_MISS,
     876                                         __func__, __LINE__);
     877         if (likely(*entry)) {
     878                 fsal_status_t status;
     879
     880                 /* Initial Ref on entry */
     881                 status = mdcache_lru_ref(*entry, LRU_REQ_INITIAL);


This thread is already holding queue lock and trying to acquire partition lock in write mode:

Thread 188 (Thread 0x7faf9979f700 (LWP 143528)):
#0 0x00007fafd1c3403e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0 #1 0x000000000052fc61 in cih_remove_checked (entry=0x7fad62914e00) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394 #2 0x0000000000530b3e in mdc_clean_entry (entry=0x7fad62914e00) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272 #3 0x000000000051df7e in mdcache_lru_clean (entry=0x7fad62914e00) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590 #4 0x0000000000522cca in _mdcache_lru_unref (entry=0x7fad62914e00, flags=8, func=0x58b700 <__func__.23710> "lru_reap_impl", line=690) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1922 #5 0x000000000051ea38 in lru_reap_impl (qid=LRU_ENTRY_L1) at /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:690




On Fri, Jul 28, 2017 at 1:34 PM, Frank Filz <ffilz...@mindspring.com <mailto:ffilz...@mindspring.com>> wrote:

    Hmm, well, that’s easy to fix…____

    __ __

    Instead of:____

    __ __

    mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);____

    goto next_lane;____

    __ __

    It could:____

    __ __

    QUNLOCK(qlane);____

    mdcache_put(entry);____

    continue;____

    __ __

    Fix posted here:____

    __ __

    https://review.gerrithub.io/371764
    <https://review.gerrithub.io/371764>____

    __ __

    Frank____

    __ __

    __ __

    *From:*Pradeep [mailto:pradeep.tho...@gmail.com
    <mailto:pradeep.tho...@gmail.com>]
    *Sent:* Friday, July 28, 2017 12:44 PM
    *To:* nfs-ganesha-devel@lists.sourceforge.net
    <mailto:nfs-ganesha-devel@lists.sourceforge.net>
    *Subject:* [Nfs-ganesha-devel] deadlock in lru_reap_impl()____

    __ __

    __ __

    I'm hitting another deadlock in mdcache with 2.5.1 base.  In this
    case two threads are in different places in lru_reap_impl()____

    __ __

    Thread 1:____

    __ __

         636                 QLOCK(qlane);____

         637                 lru = glist_first_entry(&lq->q,
    mdcache_lru_t, q);____

         638                 if (!lru)____

         639                         goto next_lane;____

         640                 refcnt = atomic_inc_int32_t(&lru->refcnt);____

         641                 entry = container_of(lru, mdcache_entry_t,
    lru);____

         642                 if (unlikely(refcnt !=
    (LRU_SENTINEL_REFCOUNT + 1))) {____

         643                         /* cant use it. */____

         644                         mdcache_lru_unref(entry,
    LRU_UNREF_QLOCKED);____

    __ __

    ​mdcache_lru_unref() could lead to the set of calls below:​____

    __ __

    ​mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry()
    -> cih_remove_checked()____

    __ __

    This tries to get partition lock which is held by 'Thread 2' which
    is trying to acquire queue lane lock.____

    __ __

    Thread 2:____

         650                 if (cih_latch_entry(&entry->fh_hk.key,
    &latch, CIH_GET_WLOCK,____

         651                                     __func__, __LINE__)) {____

         652                         QLOCK(qlane);____

    __ __

    Stack traces:____

    __ __

    Thread 1:____


    #0  0x00007f571328103e in pthread_rwlock_wrlock () from
    /lib64/libpthread.so.0____

    #1  0x000000000052f928 in cih_remove_checked (entry=0x7f548e86c400)____

         at
    
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394____

    #2  0x0000000000530805 in mdc_clean_entry (entry=0x7f548e86c400)____

         at
    
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272____

    #3  0x000000000051df7e in mdcache_lru_clean (entry=0x7f548e86c400)____

         at
    
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590____

    #4  0x00000000005229c0 in _mdcache_lru_unref (entry=0x7f548e86c400,
    flags=8, func=0x58b5c0 <__func__.23710> "lru_reap_impl", line=687)____

         at
    
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1918____

    #5  0x000000000051e83a in lru_reap_impl (qid=LRU_ENTRY_L1)____

         at
    
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:687____

    __ __

    Thread 2:____

    #0  0x00007f57132841bd in __lll_lock_wait () from
    /lib64/libpthread.so.0____

    #1  0x00007f571327fd02 in _L_lock_791 () from /lib64/libpthread.so.0____

    #2  0x00007f571327fc08 in pthread_mutex_lock () from
    /lib64/libpthread.so.0____

    #3  0x000000000051e4f5 in lru_reap_impl (qid=LRU_ENTRY_L1)____

         at
    
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:652____

    __ __

    __ __

    __ __


    
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon>
        Virus-free. www.avast.com
    
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>


    <#m_2759596608629786619_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot



_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to