I think we need to ensure that the partition lock is taken before the
qlane lock. I have a patch for this, but it introduced a refcount
issue, so I'm debugging.
Daniel
On 08/03/2017 08:52 PM, Pradeep wrote:
Thanks Franks. I merged your patch and now hitting another deadlock.
Here are the two threads:
This thread below holds the partition lock in 'read' mode and try to
acquire queue lock:
Thread 143 (Thread 0x7faf82f72700 (LWP 143573)):
#0 0x00007fafd1c371bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fafd1c32d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007fafd1c32c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000005221fd in _mdcache_lru_ref (entry=0x7fae78d19000,
flags=2, func=0x58ec80 <__func__.23467> "mdcache_find_keyed", line=881)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1813
#4 0x0000000000532686 in mdcache_find_keyed (key=0x7faf82f70760,
entry=0x7faf82f707e8) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:881
874 *entry = cih_get_by_key_latch(key, &latch,
875 CIH_GET_RLOCK |
CIH_GET_UNLOCK_ON_MISS,
876 __func__, __LINE__);
877 if (likely(*entry)) {
878 fsal_status_t status;
879
880 /* Initial Ref on entry */
881 status = mdcache_lru_ref(*entry, LRU_REQ_INITIAL);
This thread is already holding queue lock and trying to acquire
partition lock in write mode:
Thread 188 (Thread 0x7faf9979f700 (LWP 143528)):
#0 0x00007fafd1c3403e in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1 0x000000000052fc61 in cih_remove_checked (entry=0x7fad62914e00) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394
#2 0x0000000000530b3e in mdc_clean_entry (entry=0x7fad62914e00) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272
#3 0x000000000051df7e in mdcache_lru_clean (entry=0x7fad62914e00) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590
#4 0x0000000000522cca in _mdcache_lru_unref (entry=0x7fad62914e00,
flags=8, func=0x58b700 <__func__.23710> "lru_reap_impl", line=690) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1922
#5 0x000000000051ea38 in lru_reap_impl (qid=LRU_ENTRY_L1) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:690
On Fri, Jul 28, 2017 at 1:34 PM, Frank Filz <ffilz...@mindspring.com
<mailto:ffilz...@mindspring.com>> wrote:
Hmm, well, that’s easy to fix…____
__ __
Instead of:____
__ __
mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);____
goto next_lane;____
__ __
It could:____
__ __
QUNLOCK(qlane);____
mdcache_put(entry);____
continue;____
__ __
Fix posted here:____
__ __
https://review.gerrithub.io/371764
<https://review.gerrithub.io/371764>____
__ __
Frank____
__ __
__ __
*From:*Pradeep [mailto:pradeep.tho...@gmail.com
<mailto:pradeep.tho...@gmail.com>]
*Sent:* Friday, July 28, 2017 12:44 PM
*To:* nfs-ganesha-devel@lists.sourceforge.net
<mailto:nfs-ganesha-devel@lists.sourceforge.net>
*Subject:* [Nfs-ganesha-devel] deadlock in lru_reap_impl()____
__ __
__ __
I'm hitting another deadlock in mdcache with 2.5.1 base. In this
case two threads are in different places in lru_reap_impl()____
__ __
Thread 1:____
__ __
636 QLOCK(qlane);____
637 lru = glist_first_entry(&lq->q,
mdcache_lru_t, q);____
638 if (!lru)____
639 goto next_lane;____
640 refcnt = atomic_inc_int32_t(&lru->refcnt);____
641 entry = container_of(lru, mdcache_entry_t,
lru);____
642 if (unlikely(refcnt !=
(LRU_SENTINEL_REFCOUNT + 1))) {____
643 /* cant use it. */____
644 mdcache_lru_unref(entry,
LRU_UNREF_QLOCKED);____
__ __
mdcache_lru_unref() could lead to the set of calls below:____
__ __
mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry()
-> cih_remove_checked()____
__ __
This tries to get partition lock which is held by 'Thread 2' which
is trying to acquire queue lane lock.____
__ __
Thread 2:____
650 if (cih_latch_entry(&entry->fh_hk.key,
&latch, CIH_GET_WLOCK,____
651 __func__, __LINE__)) {____
652 QLOCK(qlane);____
__ __
Stack traces:____
__ __
Thread 1:____
#0 0x00007f571328103e in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0____
#1 0x000000000052f928 in cih_remove_checked (entry=0x7f548e86c400)____
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394____
#2 0x0000000000530805 in mdc_clean_entry (entry=0x7f548e86c400)____
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272____
#3 0x000000000051df7e in mdcache_lru_clean (entry=0x7f548e86c400)____
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590____
#4 0x00000000005229c0 in _mdcache_lru_unref (entry=0x7f548e86c400,
flags=8, func=0x58b5c0 <__func__.23710> "lru_reap_impl", line=687)____
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1918____
#5 0x000000000051e83a in lru_reap_impl (qid=LRU_ENTRY_L1)____
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:687____
__ __
Thread 2:____
#0 0x00007f57132841bd in __lll_lock_wait () from
/lib64/libpthread.so.0____
#1 0x00007f571327fd02 in _L_lock_791 () from /lib64/libpthread.so.0____
#2 0x00007f571327fc08 in pthread_mutex_lock () from
/lib64/libpthread.so.0____
#3 0x000000000051e4f5 in lru_reap_impl (qid=LRU_ENTRY_L1)____
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:652____
__ __
__ __
__ __
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon>
Virus-free. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>
<#m_2759596608629786619_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel