Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/18/2016 13:30, Andriy Gapon wrote: On 14/11/2016 14:00, Henri Hennebert wrote: On 11/14/2016 12:45, Andriy Gapon wrote: Okay. Luckily for us, it seems that 'm' is available in frame 5. It also happens to be the first field of 'struct faultstate'. So, could you please go to frame and print '*m' and '*(struct faultstate *)m' ? (kgdb) fr 4 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); (kgdb) print *m $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0}, pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v = 18446735281353604048}}, listq = {tqe_next = 0x0, tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11, phys_addr = 3389358080, md = {pv_list = { tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0, flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0', segind = 3 '\003', order = 13 '\r', pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} If I interpret this correctly the page is in the 'exclusive busy' state. Unfortunately, I can't tell much beyond that. But I am confident that this is the root cause of the lock-up. (kgdb) print *(struct faultstate *)m $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, first_m = 0xf800dc5d85c0, first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, entry = 0x0, lookup_still_valid = -546779784, vp = 0x601aa} (kgdb) I was wrong on this one as 'm' is actually a pointer, so the above is not correct. Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'. (kgdb) fr 5 #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 1086vm_page_busy_sleep(m, msg); (kgdb) info reg rax0x0 0 rbx0xf800b62e9c78 -8793036514184 rcx0x0 0 rdx0x0 0 rsi0x0 0 rdi0x0 0 rbp0xfe0101836810 0xfe0101836810 rsp0xfe01018367e0 0xfe01018367e0 r8 0x0 0 r9 0x0 0 r100x0 0 r110x0 0 r120xf800b642aa00 -879303520 r130xf800df68cd40 -8792344834752 r140xf800b62e9c60 -8793036514208 r150x809c51bc -2137239108 rip0x8089dd4d 0x8089dd4deflags 0x0 0 cs 0x0 0 ss 0x0 0 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 I don't know what to do from here. I am not sure how to proceed from here. The only thing I can think of is a lock order reversal between the vnode lock and the page busying quasi-lock. But examining the code I can not spot it. Another possibility is a leak of a busy page, but that's hard to debug. How hard is it to reproduce the problem? After 7 days all seems normal only one copy of innd: [root@avoriaz ~]# ps xa|grep inn 1193 - Is 0:01.40 /usr/local/news/bin/innd -r 13498 - IN 0:00.01 /usr/local/news/bin/innfeed 1194 v0- IW 0:00.00 /bin/sh /usr/local/news/bin/innwatch -i 60 I will try to stop and restart innd. All continue to look good: [root@avoriaz ~]# ps xa|grep inn 31673 - Ss 0:00.02 /usr/local/news/bin/innd 31694 - SN 0:00.01 /usr/local/news/bin/innfeed 31674 0 S0:00.01 /bin/sh /usr/local/news/bin/innwatch -i 60 I think to reproduce is just waiting it occurs by itself... One thing here: The deadlock occurs at least 5 times since 10.0R. And always with the directory /usr/local/news/bin Maybe Konstantin would have some ideas or suggestions. Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 14/11/2016 14:00, Henri Hennebert wrote: > On 11/14/2016 12:45, Andriy Gapon wrote: >> Okay. Luckily for us, it seems that 'm' is available in frame 5. It also >> happens to be the first field of 'struct faultstate'. So, could you please >> go >> to frame and print '*m' and '*(struct faultstate *)m' ? >> > (kgdb) fr 4 > #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, > wmesg= optimized out>) at /usr/src/sys/vm/vm_page.c:753 > 753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); > (kgdb) print *m > $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = > 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0}, > pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v = > 18446735281353604048}}, listq = {tqe_next = 0x0, > tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11, > phys_addr = 3389358080, md = {pv_list = { > tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode > = > 6}, wire_count = 0, busy_lock = 6, hold_count = 0, > flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 > '\0', > segind = 3 '\003', order = 13 '\r', > pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} If I interpret this correctly the page is in the 'exclusive busy' state. Unfortunately, I can't tell much beyond that. But I am confident that this is the root cause of the lock-up. > (kgdb) print *(struct faultstate *)m > $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, > first_m = > 0xf800dc5d85c0, > first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, > entry > = 0x0, lookup_still_valid = -546779784, > vp = 0x601aa} > (kgdb) I was wrong on this one as 'm' is actually a pointer, so the above is not correct. Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'. I am not sure how to proceed from here. The only thing I can think of is a lock order reversal between the vnode lock and the page busying quasi-lock. But examining the code I can not spot it. Another possibility is a leak of a busy page, but that's hard to debug. How hard is it to reproduce the problem? Maybe Konstantin would have some ideas or suggestions. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/14/2016 12:45, Andriy Gapon wrote: On 14/11/2016 11:35, Henri Hennebert wrote: On 11/14/2016 10:07, Andriy Gapon wrote: Hmm, I've just noticed another interesting thread: Thread 668 (Thread 101245): #0 sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc "vmpfw", sbt=0, pr=, flags=) at /usr/src/sys/kern/kern_synch.c:229 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 #6 0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 #7 0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at /usr/src/sys/vm/vm_fault.c:273 #8 0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:741 #9 0x808d3386 in trap (frame=0xfe0101836c00) at /usr/src/sys/amd64/amd64/trap.c:333 #10 0x808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 This tread is another program from the news system: 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 I strongly suspect that this is thread that we were looking for. I think that it has the vnode lock in the shared mode while trying to fault in a page. --clip-- Okay. Luckily for us, it seems that 'm' is available in frame 5. It also happens to be the first field of 'struct faultstate'. So, could you please go to frame and print '*m' and '*(struct faultstate *)m' ? (kgdb) fr 4 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 753 msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); (kgdb) print *m $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0}, pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v = 18446735281353604048}}, listq = {tqe_next = 0x0, tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11, phys_addr = 3389358080, md = {pv_list = { tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0, flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0', segind = 3 '\003', order = 13 '\r', pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} (kgdb) print *(struct faultstate *)m $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, first_m = 0xf800dc5d85c0, first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, entry = 0x0, lookup_still_valid = -546779784, vp = 0x601aa} (kgdb) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 14/11/2016 11:35, Henri Hennebert wrote: > > > On 11/14/2016 10:07, Andriy Gapon wrote: >> Hmm, I've just noticed another interesting thread: >> Thread 668 (Thread 101245): >> #0 sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, >> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 >> #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) >> at >> /usr/src/sys/kern/kern_synch.c:455 >> #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:646 >> #3 0x805614b1 in _sleep (ident=, lock=> optimized out>, priority=, wmesg=0x809c51bc >> "vmpfw", sbt=0, pr=, flags=) at >> /usr/src/sys/kern/kern_synch.c:229 >> #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, >> wmesg=> optimized out>) at /usr/src/sys/vm/vm_page.c:753 >> #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, >> msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 >> #6 0x80886be9 in vm_fault_hold (map=, >> vaddr=> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at >> /usr/src/sys/vm/vm_fault.c:495 >> #7 0x80885448 in vm_fault (map=0xf80011d66000, vaddr=> optimized out>, fault_type=4 '\004', fault_flags=) at >> /usr/src/sys/vm/vm_fault.c:273 >> #8 0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) >> at >> /usr/src/sys/amd64/amd64/trap.c:741 >> #9 0x808d3386 in trap (frame=0xfe0101836c00) at >> /usr/src/sys/amd64/amd64/trap.c:333 >> #10 0x808b7af1 in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:236 > > This tread is another program from the news system: > 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xf800b642aa00, > newtd=0xf8000285f000, flags=) at > /usr/src/sys/kern/sched_ule.c:1973 > >> >> I strongly suspect that this is thread that we were looking for. >> I think that it has the vnode lock in the shared mode while trying to fault >> in a >> page. >> >> Could you please check that by going to frame 6 and printing 'fs' and >> '*fs.vp'? >> It'd be interesting to understand why this thread is waiting here. >> So, please also print '*fs.m' and '*fs.object'. > > No luck :-( > (kgdb) fr 6 > #6 0x80886be9 in vm_fault_hold (map=, > vaddr= optimized out>, fault_type=4 '\004', > fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 > 495vm_page_sleep_if_busy(fs.m, "vmpfw"); > (kgdb) print fs > Cannot access memory at address 0x1fa0 > (kgdb) Okay. Luckily for us, it seems that 'm' is available in frame 5. It also happens to be the first field of 'struct faultstate'. So, could you please go to frame and print '*m' and '*(struct faultstate *)m' ? -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/14/2016 10:07, Andriy Gapon wrote: On 13/11/2016 15:28, Henri Hennebert wrote: On 11/13/2016 11:06, Andriy Gapon wrote: On 12/11/2016 14:40, Henri Hennebert wrote: [snip] Could you please show 'info local' in frame 14? I expected that 'nd' variable would be defined there and it may contain some useful information. No luck there: (kgdb) fr 14 #14 0x80636838 in kern_statat (td=0xf80009ba0500, flag=, fd=-100, path=0x0, pathseg=, sbp=, hook=0x800e2a388) at /usr/src/sys/kern/vfs_syscalls.c:2160 2160if ((error = namei()) != 0) (kgdb) info local rights = nd = error = sb = (kgdb) I also try to get information from the execve of the other treads: for tid 101250: (kgdb) fr 10 #10 0x80508ccc in sys_execve (td=0xf800b6429000, uap=0xfe010184fb80) at /usr/src/sys/kern/kern_exec.c:218 218error = kern_execve(td, , NULL); (kgdb) print *uap $4 = {fname_l_ = 0xfe010184fb80 "`\220\217\002\b", fname = 0x8028f9060 , fname_r_ = 0xfe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010184fb88 "`¶ÿÿÿ\177", argv = 0x7fffb660, argv_r_ = 0xfe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010184fb90 "\bÜÿÿÿ\177", envv = 0x7fffdc08, envv_r_ = 0xfe010184fb98 ""} (kgdb) for tid 101243: (kgdb) f 15 #15 0x80508ccc in sys_execve (td=0xf800b642b500, uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 218error = kern_execve(td, , NULL); (kgdb) print *uap $5 = {fname_l_ = 0xfe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0 , fname_r_ = 0xfe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010182cb88 "`¶ÿÿÿ\177", argv = 0x7fffb660, argv_r_ = 0xfe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010182cb90 "\bÜÿÿÿ\177", envv = 0x7fffdc08, envv_r_ = 0xfe010182cb98 ""} (kgdb) I think that you see garbage in those structures because they contain pointers to userland data. Hmm, I've just noticed another interesting thread: Thread 668 (Thread 101245): #0 sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc "vmpfw", sbt=0, pr=, flags=) at /usr/src/sys/kern/kern_synch.c:229 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 #6 0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 #7 0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at /usr/src/sys/vm/vm_fault.c:273 #8 0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:741 #9 0x808d3386 in trap (frame=0xfe0101836c00) at /usr/src/sys/amd64/amd64/trap.c:333 #10 0x808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 This tread is another program from the news system: 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 I strongly suspect that this is thread that we were looking for. I think that it has the vnode lock in the shared mode while trying to fault in a page. Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'? It'd be interesting to understand why this thread is waiting here. So, please also print '*fs.m' and '*fs.object'. No luck :-( (kgdb) fr 6 #6 0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 495 vm_page_sleep_if_busy(fs.m, "vmpfw"); (kgdb) print fs Cannot access memory at address 0x1fa0 (kgdb) Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 13/11/2016 15:28, Henri Hennebert wrote: > On 11/13/2016 11:06, Andriy Gapon wrote: >> On 12/11/2016 14:40, Henri Hennebert wrote: >>> I attatch it >> >> Thank you! >> So, these two threads are trying to get the lock in the exclusive mode: >> Thread 687 (Thread 101243): >> #0 sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, >> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 >> #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) >> at >> /usr/src/sys/kern/kern_synch.c:455 >> #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:646 >> #3 0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs", >> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 >> #4 0x8052f39d in __lockmgr_args (lk=, >> flags=> optimized out>, ilk=, wmesg=, >> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958 >> #5 0x80616a8c in vop_stdlock (ap=) at >> lockmgr.h:98 >> #6 0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087 >> #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, >> file=, line=) at vnode_if.h:859 >> #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, >> td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 >> #9 0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=, >> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 >> #10 0x806133dc in vfs_cache_lookup (ap=) at >> /usr/src/sys/kern/vfs_cache.c:1081 >> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127 >> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 >> #13 0x8061c492 in namei (ndp=) at >> /usr/src/sys/kern/vfs_lookup.c:306 >> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 >> #15 0x80508ccc in sys_execve (td=0xf800b642b500, >> uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 >> #16 0x808d449e in amd64_syscall (td=, traced=0) >> at >> subr_syscall.c:135 >> #17 0x808b7ddb in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 >> >> Thread 681 (Thread 101147): >> #0 sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, >> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 >> #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) >> at >> /usr/src/sys/kern/kern_synch.c:455 >> #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:646 >> #3 0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs", >> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 >> #4 0x8052f39d in __lockmgr_args (lk=, >> flags=> optimized out>, ilk=, wmesg=, >> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958 >> #5 0x80616a8c in vop_stdlock (ap=) at >> lockmgr.h:98 >> #6 0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087 >> #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, >> file=, line=) at vnode_if.h:859 >> #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, >> td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 >> #9 0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=, >> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 >> #10 0x806133dc in vfs_cache_lookup (ap=) at >> /usr/src/sys/kern/vfs_cache.c:1081 >> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127 >> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 >> #13 0x8061c492 in namei (ndp=) at >> /usr/src/sys/kern/vfs_lookup.c:306 >> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 >> #15 0x80508ccc in sys_execve (td=0xf80065f4e500, >> uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 >> #16 0x808d449e in amd64_syscall (td=, traced=0) >> at >> subr_syscall.c:135 >> #17 0x808b7ddb in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 > > This 2 threads are innd processes. In core.txt.4: > >8 14789 29165 0 24 4 40040 6612 zfs DN- 0:00.00 [innd] >8 29165 1 0 20 0 42496 6888 select Ds- 0:01.33 [innd] >8 49778 29165 0 24 4 40040 6900 zfs DN- 0:00.00 [innd] >8 82034 29165 0 24 4 132 0 zfs DN- 0:00.00 [innd] > > the corresponding info treads are: > > 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xf800b642b500, > newtd=0xf8000285ea00, flags=) at > /usr/src/sys/kern/sched_ule.c:1973 > 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xf80065f4e500, > newtd=0xf8000285f000, flags=) at > /usr/src/sys/kern/sched_ule.c:1973 > 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xf800b6429000, >
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/13/2016 14:28, Henri Hennebert wrote: This 2 threads are innd processes. In core.txt.4: 8 14789 29165 0 24 4 40040 6612 zfs DN- 0:00.00 [innd] 8 29165 1 0 20 0 42496 6888 select Ds- 0:01.33 [innd] 8 49778 29165 0 24 4 40040 6900 zfs DN- 0:00.00 [innd] 8 82034 29165 0 24 4 132 0 zfs DN- 0:00.00 [innd] the corresponding info treads are: 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xf800b6429000, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 665 Thread 101262 (PID=29165: innd) sched_switch (td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 In case it may help, I have a look at innd. This processes use 2 execv: one to execute /bin/sh and the other to execute itself: /* ** Re-exec ourselves. */ static const char * CCxexec(char *av[]) { char*innd; char*p; int i; if (CCargv == NULL) return "1 no argv!"; innd = concatpath(innconf->pathbin, "innd"); /* Get the pathname. */ p = av[0]; if (*p == '\0' || strcmp(p, "innd") == 0) CCargv[0] = innd; else return "1 Bad value"; #ifdef DO_PERL PLmode(Mode, OMshutdown, av[0]); #endif #ifdef DO_PYTHON PYmode(Mode, OMshutdown, av[0]); #endif JustCleanup(); syslog(L_NOTICE, "%s execv %s", LogName, CCargv[0]); /* Close all fds to protect possible fd leaking accross successive innds. */ for (i=3; i<30; i++) close(i); execv(CCargv[0], CCargv); syslog(L_FATAL, "%s cant execv %s %m", LogName, CCargv[0]); _exit(1); /* NOTREACHED */ return "1 Exit failed"; } The culprit may be /usr/local/news/bin/innd, remember that find is locked in /usr/local/news/bin Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/13/2016 11:06, Andriy Gapon wrote: On 12/11/2016 14:40, Henri Hennebert wrote: I attatch it Thank you! So, these two threads are trying to get the lock in the exclusive mode: Thread 687 (Thread 101243): #0 sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf800b642b500, uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 Thread 681 (Thread 101147): #0 sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf80065f4e500, uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 This 2 threads are innd processes. In core.txt.4: 8 14789 29165 0 24 4 40040 6612 zfs DN- 0:00.00 [innd] 8 29165 1 0 20 0 42496 6888 select Ds- 0:01.33 [innd] 8 49778 29165 0 24 4 40040 6900 zfs DN- 0:00.00 [innd] 8 82034 29165 0 24 4 132 0 zfs DN- 0:00.00 [innd] the corresponding info treads are: 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xf800b6429000, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 665 Thread 101262 (PID=29165: innd) sched_switch (td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 So your missing tread must be 101250: (kgdb) tid 101250 [Switching to thread 669 (Thread 101250)]#0 sched_switch (td=0xf800b6429000, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 1973cpuid = PCPU_GET(cpuid); Current language: auto; currently minimal (kgdb) bt
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 12/11/2016 14:40, Henri Hennebert wrote: > I attatch it Thank you! So, these two threads are trying to get the lock in the exclusive mode: Thread 687 (Thread 101243): #0 sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf800b642b500, uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 Thread 681 (Thread 101147): #0 sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf80065f4e500, uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 And the original stuck thread wants to get the lock in the shared mode. And there should be another thread that already holds the lock in the shared mode. But I am not able to identify it. I wonder if the original thread could be trying to get the lock recursively... It would be interesting to get more details from thread 101112. You can switch to it using tid command, you can use 'fr' to select frames, 'info local' and 'info args' to see what variables are available (not optimized out) and the you can print any that look interesting. It would be nice to get a file path and a directory vnode where the lookup is called. Thank you. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/11/2016 16:50, Henri Hennebert wrote: > > > On 11/11/2016 12:24, Andriy Gapon wrote: >> >> At this stage I would try to get a system crash dump for post-mortem >> analysis. >> There are a few way to do that. You can enter ddb and then run 'dump' and >> 'reset' commands. Or you can just do `sysctl debug.kdb.panic=1`. >> In either case, please double-check that your system has a dump device >> configured. >> > It take some time to upload the dump... > > You can find it at > > http://tignes.restart.be/Xfer/ Could you please open the dump in kgdb and execute the following commands? set logging on set logging redirect on set pagination off thread apply all bt quit After that you should get gdb.txt file in the current directory. I would like to see it. Thank you. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/11/2016 12:24, Andriy Gapon wrote: At this stage I would try to get a system crash dump for post-mortem analysis. There are a few way to do that. You can enter ddb and then run 'dump' and 'reset' commands. Or you can just do `sysctl debug.kdb.panic=1`. In either case, please double-check that your system has a dump device configured. It take some time to upload the dump... You can find it at http://tignes.restart.be/Xfer/ Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 21:41, Henri Hennebert wrote: > On 11/10/2016 19:40, Andriy Gapon wrote: >> On 10/11/2016 19:55, Henri Hennebert wrote: >>> >>> >>> On 11/10/2016 18:33, Andriy Gapon wrote: On 10/11/2016 18:12, Henri Hennebert wrote: > On 11/10/2016 16:54, Andriy Gapon wrote: >> On 10/11/2016 17:20, Henri Hennebert wrote: >>> On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. >>> I Think I miss something in your request: >> >> Oh, sorry! The very first step should be 'tid 101112' to switch to the >> correct >> context. >> > > (kgdb) fr 7 > #7 0x8063c5b3 in _vn_lock (vp=, > flags=2121728, "value optimized out" - not good > file=, > line=) at vnode_if.h:859 > 859vnode_if.h: No such file or directory. > in vnode_if.h > (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. >>> >>> Does this seems better? >> >> Yes! >> >>> (kgdb) fr 8 >>> #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, >>> td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 >>> 2523if ((error = vn_lock(vp, flags)) != 0) { >>> (kgdb) print *vp >>> $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = >>> 0xf80049c1f420, v_mount = 0xf800093aa660, >>> v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = >>> 0xf80049c2bb30}, >>> v_un = {vu_mount = 0x0, vu_socket = 0x0, >>> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev >>> = >>> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { >>> tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, >>> v_cache_dd = >>> 0x0, v_lock = {lock_object = { >>> lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, >>> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, >>> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = >>> 0x8099e9e0 "vnode interlock", lo_flags = 16973824, >>> lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = >>> 0xf80049c2c068, v_actfreelist = { >>> tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj >>> = >>> {bo_lock = {lock_object = { >>> lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = >>> 86179840, >>> lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, >>> bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, >>> bo_synclist = >>> {le_next = 0x0, le_prev = 0x0}, >>> bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, >>> bo_clean = >>> {bv_hd = {tqh_first = 0x0, >>> tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = >>> 0}, >>> bo_dirty = {bv_hd = {tqh_first = 0x0, >>> tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = >>> 0}, >>> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, >>> v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = >>> {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, >>> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, >>> v_holdcnt = 9, v_usecount = 6, v_iflag = 512, >>> v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} >>> (kgdb) >> >> flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT >> lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | >> LK_SHARED_WAITERS | >> LK_SHARE >> >> So, here's what we have here: this thread tries to get a shared lock on the >> vnode, the vnode is already locked in shared mode, but there is an exclusive >> waiter (or, perhaps, multiple waiters). So, this thread can not get the lock >> because of the exclusive waiter. And I do not see an easy way to identify >> that >> waiter. >> >> In the procstat output that you provided earlier there was no other thread in >> vn_lock. Hmm, I see this: >> procstat: sysctl: kern.proc.kstack: 14789: Device busy >> procstat: sysctl: kern.proc.kstack: 82034: Device busy >> >> Could you please check what those two processes are (if they are still >> running)? >> Perhaps try procstat for each of the pids several times. >> At this stage I would try to get a system crash dump for post-mortem analysis. There are a few way to do that. You can enter ddb and then run 'dump' and 'reset' commands. Or you can just do `sysctl debug.kdb.panic=1`. In either case, please double-check that your system has a dump device configured. > This 2 processes are the 2
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 19:40, Andriy Gapon wrote: On 10/11/2016 19:55, Henri Hennebert wrote: On 11/10/2016 18:33, Andriy Gapon wrote: On 10/11/2016 18:12, Henri Hennebert wrote: On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. (kgdb) fr 7 #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, "value optimized out" - not good file=, line=) at vnode_if.h:859 859vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. Does this seems better? Yes! (kgdb) fr 8 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 2523if ((error = vn_lock(vp, flags)) != 0) { (kgdb) print *vp $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = 0xf80049c1f420, v_mount = 0xf800093aa660, v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0x8099e9e0 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xf80049c2c068, v_actfreelist = { tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj = {bo_lock = {lock_object = { lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512, v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} (kgdb) flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS | LK_SHARE So, here's what we have here: this thread tries to get a shared lock on the vnode, the vnode is already locked in shared mode, but there is an exclusive waiter (or, perhaps, multiple waiters). So, this thread can not get the lock because of the exclusive waiter. And I do not see an easy way to identify that waiter. In the procstat output that you provided earlier there was no other thread in vn_lock. Hmm, I see this: procstat: sysctl: kern.proc.kstack: 14789: Device busy procstat: sysctl: kern.proc.kstack: 82034: Device busy Could you please check what those two processes are (if they are still running)? Perhaps try procstat for each of the pids several times. This 2 processes are the 2 instances of the innd daemon (news server) which seems in accordance with the directory /usr/local/news/bin. [root@avoriaz ~]# procstat 14789 PID PPID PGID SID TSID THR LOGINWCHAN EMUL COMM 14789 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat 82034 PID PPID PGID SID TSID THR LOGINWCHAN EMUL COMM 82034 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMMFD T V FLAGSREF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMMFD T V FLAGSREF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMMFD T V FLAGS
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 19:55, Henri Hennebert wrote: > > > On 11/10/2016 18:33, Andriy Gapon wrote: >> On 10/11/2016 18:12, Henri Hennebert wrote: >>> On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: > On 11/10/2016 15:00, Andriy Gapon wrote: >> Interesting. I can not spot any suspicious thread that would hold the >> vnode >> lock. Could you please run kgdb (just like that, no arguments), then >> execute >> 'bt' command and then select a frame when _vn_lock is called with 'fr N' >> command. Then please 'print *vp' and share the result. >> > I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. >>> >>> (kgdb) fr 7 >>> #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, >> >> "value optimized out" - not good >> >>> file=, >>> line=) at vnode_if.h:859 >>> 859vnode_if.h: No such file or directory. >>> in vnode_if.h >>> (kgdb) print *vp >> >> I am not sure if this output is valid, because of the message above. >> Could you please try to navigate to nearby frames and see if vp itself has a >> valid value there. If you can find such a frame please do *vp there. >> > > Does this seems better? Yes! > (kgdb) fr 8 > #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, > td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 > 2523if ((error = vn_lock(vp, flags)) != 0) { > (kgdb) print *vp > $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = > 0xf80049c1f420, v_mount = 0xf800093aa660, > v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = > 0xf80049c2bb30}, > v_un = {vu_mount = 0x0, vu_socket = 0x0, > vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = > 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { > tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, > v_cache_dd = > 0x0, v_lock = {lock_object = { > lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, > lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, > lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = > 0x8099e9e0 "vnode interlock", lo_flags = 16973824, > lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = > 0xf80049c2c068, v_actfreelist = { > tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj = > {bo_lock = {lock_object = { > lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840, > lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, > bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist = > {le_next = 0x0, le_prev = 0x0}, > bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, > bo_clean = > {bv_hd = {tqh_first = 0x0, > tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, > bo_dirty = {bv_hd = {tqh_first = 0x0, > tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, > bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, > v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = > {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, > rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, > v_holdcnt = 9, v_usecount = 6, v_iflag = 512, > v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} > (kgdb) flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS | LK_SHARE So, here's what we have here: this thread tries to get a shared lock on the vnode, the vnode is already locked in shared mode, but there is an exclusive waiter (or, perhaps, multiple waiters). So, this thread can not get the lock because of the exclusive waiter. And I do not see an easy way to identify that waiter. In the procstat output that you provided earlier there was no other thread in vn_lock. Hmm, I see this: procstat: sysctl: kern.proc.kstack: 14789: Device busy procstat: sysctl: kern.proc.kstack: 82034: Device busy Could you please check what those two processes are (if they are still running)? Perhaps try procstat for each of the pids several times. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 18:33, Andriy Gapon wrote: On 10/11/2016 18:12, Henri Hennebert wrote: On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. (kgdb) fr 7 #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, "value optimized out" - not good file=, line=) at vnode_if.h:859 859vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. Does this seems better? (kgdb) fr 8 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 2523if ((error = vn_lock(vp, flags)) != 0) { (kgdb) print *vp $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = 0xf80049c1f420, v_mount = 0xf800093aa660, v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0x8099e9e0 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xf80049c2c068, v_actfreelist = { tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj = {bo_lock = {lock_object = { lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512, v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} (kgdb) Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 18:12, Henri Hennebert wrote: > On 11/10/2016 16:54, Andriy Gapon wrote: >> On 10/11/2016 17:20, Henri Hennebert wrote: >>> On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. >>> I Think I miss something in your request: >> >> Oh, sorry! The very first step should be 'tid 101112' to switch to the >> correct >> context. >> > > (kgdb) fr 7 > #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, "value optimized out" - not good > file=, > line=) at vnode_if.h:859 > 859vnode_if.h: No such file or directory. > in vnode_if.h > (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. > $1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 0xf80009a41000, > v_data = 0x0, v_mount = 0xf80009a41010, > v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un = > {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, > vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, le_prev = > 0x0}, v_cache_src = {lh_first = 0xfe010186d768}, > v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, v_cache_dd = > 0xf8000284f000, v_lock = {lock_object = { > lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, lo_witness = > 0xf800068bd480}, > lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = -2048, > lk_pri = 0}, v_interlock = {lock_object = { > lo_name = 0x18af8 address>, lo_flags = 0, lo_data = 0, > lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist = > {tqe_next = 0x0, tqe_prev = 0xf80009ba05c0}, > v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 "", > lo_flags = 1, lo_data = 0, lo_witness = 0x400ff}, > rw_lock = 2}, bo_ops = 0x1, bo_object = 0xf80049c2c068, > bo_synclist = {le_next = 0x813be535, > le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, > bo_clean = > {bv_hd = {tqh_first = 0x0, tqh_last = 0x0}, > bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = > 0xf80088ac8d00, tqh_last = 0xf8003cc5b600}, > bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, bo_numoutput = > 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0, > v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0xf88, > tqh_last = 0x19cc}, rl_currdep = 0x3f8}, > v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 0, > v_usecount = 2369, v_iflag = 0, v_vflag = 0, > v_writecount = 0, v_hash = 0, v_type = VNON} > (kgdb) > > Thanks for your time > > Henri -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. (kgdb) fr 7 #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, file=, line=) at vnode_if.h:859 859 vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vp $1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 0xf80009a41000, v_data = 0x0, v_mount = 0xf80009a41010, v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, le_prev = 0x0}, v_cache_src = {lh_first = 0xfe010186d768}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, v_cache_dd = 0xf8000284f000, v_lock = {lock_object = { lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, lo_witness = 0xf800068bd480}, lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = -2048, lk_pri = 0}, v_interlock = {lock_object = { lo_name = 0x18af8 Bad address>, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist = {tqe_next = 0x0, tqe_prev = 0xf80009ba05c0}, v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 "", lo_flags = 1, lo_data = 0, lo_witness = 0x400ff}, rw_lock = 2}, bo_ops = 0x1, bo_object = 0xf80049c2c068, bo_synclist = {le_next = 0x813be535, le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0x0}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0xf80088ac8d00, tqh_last = 0xf8003cc5b600}, bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, bo_numoutput = 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0xf88, tqh_last = 0x19cc}, rl_currdep = 0x3f8}, v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 0, v_usecount = 2369, v_iflag = 0, v_vflag = 0, v_writecount = 0, v_hash = 0, v_type = VNON} (kgdb) Thanks for your time Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 17:20, Henri Hennebert wrote: > On 11/10/2016 15:00, Andriy Gapon wrote: >> Interesting. I can not spot any suspicious thread that would hold the vnode >> lock. Could you please run kgdb (just like that, no arguments), then execute >> 'bt' command and then select a frame when _vn_lock is called with 'fr N' >> command. Then please 'print *vp' and share the result. >> > I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. > [root@avoriaz ~]# kgdb > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /usr/lib/debug//boot/kernel/zfs.ko.debug...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. > done. > > --- clip --- > > Loaded symbols for /boot/kernel/accf_data.ko > Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from > /usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done. > done. > Loaded symbols for /boot/kernel/daemon_saver.ko > #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, > flags= optimized out>) > at /usr/src/sys/kern/sched_ule.c:1973 > 1973cpuid = PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, > flags= optimized out>) > at /usr/src/sys/kern/sched_ule.c:1973 > #1 0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at > /usr/src/sys/kern/kern_tc.c:2121 > #2 0x80555227 in timekeep_push_vdso () at > /usr/src/sys/kern/kern_sharedpage.c:174 > #3 0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426 > #4 0x804eaa41 in hardclock_cnt (cnt=1, usermode= out>) > at /usr/src/sys/kern/kern_clock.c:589 > #5 0x808fac74 in handleevents (now=, fake=0) at > /usr/src/sys/kern/kern_clocksource.c:223 > #6 0x808fb1d7 in timercb (et=0x8100cf20, arg= out>) at /usr/src/sys/kern/kern_clocksource.c:352 > #7 0xf800b6429a00 in ?? () > #8 0x81051080 in vm_page_array () > #9 0x81051098 in vm_page_queue_free_mtx () > #10 0xfe0101818920 in ?? () > #11 0x805399c0 in __mtx_lock_sleep (c=, tid=Error > accessing memory address 0xffac: Bad add\ > ress. > ) at /usr/src/sys/kern/kern_mutex.c:590 > Previous frame inner to this frame (corrupt stack?) > Current language: auto; currently minimal > (kgdb) q > [root@avoriaz ~]# > > I don't find the requested frame > > Henri -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 15:00, Andriy Gapon wrote: On 10/11/2016 12:30, Henri Hennebert wrote: On 11/10/2016 11:21, Andriy Gapon wrote: On 09/11/2016 15:58, Eric van Gyzen wrote: On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. In fact, it's procstat -kk -a. With just one thread we would see that a thread is blocked on something, but we won't see why that something can not be acquired. I attach the result, Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: [root@avoriaz ~]# kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. --- clip --- Loaded symbols for /boot/kernel/accf_data.ko Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from /usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done. done. Loaded symbols for /boot/kernel/daemon_saver.ko #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, flags=) at /usr/src/sys/kern/sched_ule.c:1973 1973cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at /usr/src/sys/kern/kern_tc.c:2121 #2 0x80555227 in timekeep_push_vdso () at /usr/src/sys/kern/kern_sharedpage.c:174 #3 0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426 #4 0x804eaa41 in hardclock_cnt (cnt=1, usermode=optimized out>) at /usr/src/sys/kern/kern_clock.c:589 #5 0x808fac74 in handleevents (now=, fake=0) at /usr/src/sys/kern/kern_clocksource.c:223 #6 0x808fb1d7 in timercb (et=0x8100cf20, arg=optimized out>) at /usr/src/sys/kern/kern_clocksource.c:352 #7 0xf800b6429a00 in ?? () #8 0x81051080 in vm_page_array () #9 0x81051098 in vm_page_queue_free_mtx () #10 0xfe0101818920 in ?? () #11 0x805399c0 in __mtx_lock_sleep (c=, tid=Error accessing memory address 0xffac: Bad add\ ress. ) at /usr/src/sys/kern/kern_mutex.c:590 Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) q [root@avoriaz ~]# I don't find the requested frame Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 12:30, Henri Hennebert wrote: > On 11/10/2016 11:21, Andriy Gapon wrote: >> On 09/11/2016 15:58, Eric van Gyzen wrote: >>> On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? >>> >>> I can't help with the deadlock, but someone who _can_ help will probably >>> ask for >>> the output of "procstat -kk PID" with the PID of the "find" process. >> >> In fact, it's procstat -kk -a. With just one thread we would see that a >> thread >> is blocked on something, but we won't see why that something can not be >> acquired. >> >> > I attach the result, Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 11:21, Andriy Gapon wrote: On 09/11/2016 15:58, Eric van Gyzen wrote: On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. In fact, it's procstat -kk -a. With just one thread we would see that a thread is blocked on something, but we won't see why that something can not be acquired. I attach the result, Henri [root@avoriaz ~]# procstat -kk -a PIDTID COMM TDNAME KSTACK 0 10 kernel swapper mi_switch+0xd2 sleepq_timedwait+0x3a _sleep+0x281 swapper+0x464 btext+0x2c 0 19 kernel kqueue_ctx taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100012 kernel aiod_kick taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100013 kernel thread taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100018 kernel firmware taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100022 kernel acpi_task_0 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100023 kernel acpi_task_1 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100024 kernel acpi_task_2 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100025 kernel em0 que mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100026 kernel em0 txq mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100027 kernel em1 taskqmi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100060 kernel mca taskqmi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100061 kernel system_taskq_0 mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100062 kernel system_taskq_1 mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100063 kernel dbu_evictmi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100072 kernel CAM taskqmi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100086 kernel if_config_tqg_0 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd gtaskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100087 kernel if_io_tqg_0 mi_switch+0xd2 sleepq_wait+0x3a
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 09/11/2016 15:58, Eric van Gyzen wrote: > On 11/09/2016 07:48, Henri Hennebert wrote: >> I encounter a strange deadlock on >> >> FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 >> r308260: >> Fri Nov 4 02:51:33 CET 2016 >> r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 >> >> This system is exclusively running on zfs. >> >> After 3 or 4 days, `periodic daily` is locked in the directory >> /usr/local/news/bin >> >> [root@avoriaz ~]# ps xa|grep find >> 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) >> -prune >> -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam >> 462 1 S+ 0:00.00 grep find >> [root@avoriaz ~]# procstat -f 85656 >> PID COMMFD T V FLAGSREF OFFSET PRO NAME >> 85656 find text v r r--- - - - /usr/bin/find >> 85656 find cwd v d r--- - - - /usr/local/news/bin >> 85656 find root v d r--- - - - / >> 85656 find 0 v c r--- 3 0 - /dev/null >> 85656 find 1 p - rw-- 1 0 - - >> 85656 find 2 v r -w-- 7 17 - - >> 85656 find 3 v d r--- 1 0 - /home/root >> 85656 find 4 v d r--- 1 0 - /home/root >> 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin >> [root@avoriaz ~]# >> >> If I try `ls /usr/local/news/bin` it is also locked. >> >> After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' >> >> After a reset and reboot I can access /usr/local/news/bin. >> >> I delete this directory and reinstall the package `portupgrade -fu news/inn` >> >> 5 days later `periodic daily`is locked on the same directory :-o >> >> Any idea? > > I can't help with the deadlock, but someone who _can_ help will probably ask > for > the output of "procstat -kk PID" with the PID of the "find" process. In fact, it's procstat -kk -a. With just one thread we would see that a thread is blocked on something, but we won't see why that something can not be acquired. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/09/2016 19:23, Thierry Thomas wrote: Le mer. 9 nov. 16 à 15:03:49 +0100, Henri Hennebertécrivait : [root@avoriaz ~]# procstat -kk 85656 PIDTID COMM TDNAME KSTACK 85656 101112 find -mi_switch+0xd2 sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb It looks similar to the problem reportes in PR 205163 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163 May be causes by too small values for some vfs.zfs.arc*. Could you please list sysctl for vfs.zfs.arc_max and others? Regards, [root@avoriaz ~]# sysctl vfs.zfs vfs.zfs.trim.max_interval: 1 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.enabled: 1 vfs.zfs.vol.unmap_enabled: 1 vfs.zfs.vol.recursive: 0 vfs.zfs.vol.mode: 1 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 5000 vfs.zfs.version.acl: 1 vfs.zfs.version.ioctl: 6 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 vfs.zfs.sync_pass_rewrite: 2 vfs.zfs.sync_pass_dont_compress: 5 vfs.zfs.sync_pass_deferred_free: 2 vfs.zfs.zio.exclude_metadata: 0 vfs.zfs.zio.use_uma: 1 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.min_auto_ashift: 9 vfs.zfs.max_auto_ashift: 13 vfs.zfs.vdev.trim_max_pending: 1 vfs.zfs.vdev.bio_delete_disable: 0 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.scrub_max_active: 2 vfs.zfs.vdev.scrub_min_active: 1 vfs.zfs.vdev.async_write_max_active: 10 vfs.zfs.vdev.async_write_min_active: 1 vfs.zfs.vdev.async_read_max_active: 3 vfs.zfs.vdev.async_read_min_active: 1 vfs.zfs.vdev.sync_write_max_active: 10 vfs.zfs.vdev.sync_write_min_active: 10 vfs.zfs.vdev.sync_read_max_active: 10 vfs.zfs.vdev.sync_read_min_active: 10 vfs.zfs.vdev.max_active: 1000 vfs.zfs.vdev.async_write_active_max_dirty_percent: 60 vfs.zfs.vdev.async_write_active_min_dirty_percent: 30 vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 vfs.zfs.vdev.mirror.non_rotating_inc: 0 vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 vfs.zfs.vdev.mirror.rotating_seek_inc: 5 vfs.zfs.vdev.mirror.rotating_inc: 0 vfs.zfs.vdev.trim_on_init: 1 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 0 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.metaslabs_per_vdev: 200 vfs.zfs.txg.timeout: 5 vfs.zfs.space_map_blksz: 4096 vfs.zfs.spa_slop_shift: 5 vfs.zfs.spa_asize_inflation: 24 vfs.zfs.deadman_enabled: 1 vfs.zfs.deadman_checktime_ms: 5000 vfs.zfs.deadman_synctime_ms: 100 vfs.zfs.debug_flags: 0 vfs.zfs.recover: 0 vfs.zfs.spa_load_verify_data: 1 vfs.zfs.spa_load_verify_metadata: 1 vfs.zfs.spa_load_verify_maxinflight: 1 vfs.zfs.ccw_retry_interval: 300 vfs.zfs.check_hostid: 1 vfs.zfs.mg_fragmentation_threshold: 85 vfs.zfs.mg_noalloc_threshold: 0 vfs.zfs.condense_pct: 200 vfs.zfs.metaslab.bias_enabled: 1 vfs.zfs.metaslab.lba_weighting_enabled: 1 vfs.zfs.metaslab.fragmentation_factor_enabled: 1 vfs.zfs.metaslab.preload_enabled: 1 vfs.zfs.metaslab.preload_limit: 3 vfs.zfs.metaslab.unload_delay: 8 vfs.zfs.metaslab.load_pct: 50 vfs.zfs.metaslab.min_alloc_size: 33554432 vfs.zfs.metaslab.df_free_pct: 4 vfs.zfs.metaslab.df_alloc_threshold: 131072 vfs.zfs.metaslab.debug_unload: 0 vfs.zfs.metaslab.debug_load: 0 vfs.zfs.metaslab.fragmentation_threshold: 70 vfs.zfs.metaslab.gang_bang: 16777217 vfs.zfs.free_bpobj_enabled: 1 vfs.zfs.free_max_blocks: 18446744073709551615 vfs.zfs.no_scrub_prefetch: 0 vfs.zfs.no_scrub_io: 0 vfs.zfs.resilver_min_time_ms: 3000 vfs.zfs.free_min_time_ms: 1000 vfs.zfs.scan_min_time_ms: 1000 vfs.zfs.scan_idle: 50 vfs.zfs.scrub_delay: 4 vfs.zfs.resilver_delay: 2 vfs.zfs.top_maxinflight: 32 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.max_distance: 8388608 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.delay_scale: 50 vfs.zfs.delay_min_dirty_percent: 60 vfs.zfs.dirty_data_sync: 67108864 vfs.zfs.dirty_data_max_percent: 10 vfs.zfs.dirty_data_max_max: 4294967296 vfs.zfs.dirty_data_max: 373664153 vfs.zfs.max_recordsize: 1048576 vfs.zfs.mdcomp_disable: 0 vfs.zfs.nopwrite_enabled: 1 vfs.zfs.dedup.prefetch: 1 vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 24202240 vfs.zfs.mfu_ghost_metadata_lsize: 136404992 vfs.zfs.mfu_ghost_size: 160607232 vfs.zfs.mfu_data_lsize: 449569280 vfs.zfs.mfu_metadata_lsize: 102724608 vfs.zfs.mfu_size: 714202624 vfs.zfs.mru_ghost_data_lsize: 874834432 vfs.zfs.mru_ghost_metadata_lsize: 387692032 vfs.zfs.mru_ghost_size: 1262526464 vfs.zfs.mru_data_lsize: 151275008 vfs.zfs.mru_metadata_lsize: 13547008 vfs.zfs.mru_size: 322614272 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 2916352 vfs.zfs.l2arc_norw: 1
Re: Freebsd 11.0 RELEASE - ZFS deadlock
Le mer. 9 nov. 16 à 15:03:49 +0100, Henri Hennebertécrivait : > [root@avoriaz ~]# procstat -kk 85656 >PIDTID COMM TDNAME KSTACK > 85656 101112 find -mi_switch+0xd2 > sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c > VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 > vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 > kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb It looks similar to the problem reportes in PR 205163 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163 May be causes by too small values for some vfs.zfs.arc*. Could you please list sysctl for vfs.zfs.arc_max and others? Regards, -- Th. Thomas. signature.asc Description: PGP signature
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/09/2016 14:58, Eric van Gyzen wrote: On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. Eric [root@avoriaz ~]# procstat -kk 85656 PIDTID COMM TDNAME KSTACK 85656 101112 find -mi_switch+0xd2 sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/09/2016 07:48, Henri Hennebert wrote: > I encounter a strange deadlock on > > FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 > r308260: > Fri Nov 4 02:51:33 CET 2016 > r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 > > This system is exclusively running on zfs. > > After 3 or 4 days, `periodic daily` is locked in the directory > /usr/local/news/bin > > [root@avoriaz ~]# ps xa|grep find > 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune > -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam > 462 1 S+ 0:00.00 grep find > [root@avoriaz ~]# procstat -f 85656 > PID COMMFD T V FLAGSREF OFFSET PRO NAME > 85656 find text v r r--- - - - /usr/bin/find > 85656 find cwd v d r--- - - - /usr/local/news/bin > 85656 find root v d r--- - - - / > 85656 find 0 v c r--- 3 0 - /dev/null > 85656 find 1 p - rw-- 1 0 - - > 85656 find 2 v r -w-- 7 17 - - > 85656 find 3 v d r--- 1 0 - /home/root > 85656 find 4 v d r--- 1 0 - /home/root > 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin > [root@avoriaz ~]# > > If I try `ls /usr/local/news/bin` it is also locked. > > After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' > > After a reset and reboot I can access /usr/local/news/bin. > > I delete this directory and reinstall the package `portupgrade -fu news/inn` > > 5 days later `periodic daily`is locked on the same directory :-o > > Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. Eric ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Freebsd 11.0 RELEASE - ZFS deadlock
I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"