Re: 12.0-RELEASE zfs/vnode deadlock issue
On Mon, Mar 4, 2019 at 5:29 PM Andriy Gapon wrote: > On 04/03/2019 22:35, Nick Rogers wrote: > > v_lock = {lock_object = {lo_name = > > 0x8144af45 "zfs", lo_flags = 117112840, lo_data = 0, lo_witness = > > 0x0}, lk_lock = 18446744073709551605, lk_exslpfail = 0, lk_timo = 51, > > lk_pri = 96} > > Hmm, lk_lock looks bogus. > 18446744073709551605 == 0xfff5 and it's LK_SHARE | > LK_EXCLUSIVE_WAITERS with 0xfff shared owners. > Perhaps, this is a result of LK_SHARERS_LOCK(-1). > > Is your kernel compiled with INVARIANTS and INVARIANT_SUPPORT? > I suspect that the vnode was accessed (unlocked?) through a stale pointer > after > it was recycled. > I don't believe so - it's basically amd64 GENERIC w/ a reduced set of modules and static zfs option. > -- > Andriy Gapon > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 12.0-RELEASE zfs/vnode deadlock issue
On 04/03/2019 22:35, Nick Rogers wrote: > v_lock = {lock_object = {lo_name = > 0x8144af45 "zfs", lo_flags = 117112840, lo_data = 0, lo_witness = > 0x0}, lk_lock = 18446744073709551605, lk_exslpfail = 0, lk_timo = 51, > lk_pri = 96} Hmm, lk_lock looks bogus. 18446744073709551605 == 0xfff5 and it's LK_SHARE | LK_EXCLUSIVE_WAITERS with 0xfff shared owners. Perhaps, this is a result of LK_SHARERS_LOCK(-1). Is your kernel compiled with INVARIANTS and INVARIANT_SUPPORT? I suspect that the vnode was accessed (unlocked?) through a stale pointer after it was recycled. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 12.0-RELEASE zfs/vnode deadlock issue
On Sat, Mar 2, 2019 at 12:48 PM Andriy Gapon wrote: > On 01/03/2019 17:00, Nick Rogers wrote: > > 36704 101146 perl- mi_switch+0xe1 > > sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c > VOP_LOCK1_APV+0x7e > > _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d > > zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9 > > kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101 > > I suspect that this thread is a root cause of the problem. > In this place, the vnode should be freshly created and not visible to > anything > but the current thread. So, vn_lock() should always immediately succeed. > I > cannot understand how the vnode lock could be held by another thread. > It happened again. I tried to get a backtrace from the offending thread and one of the others waiting for it. At the moment I have access to this particular system in its bad state and can leave it like this for as long as possible, so let me know if there's something else useful I can get out of the debugger. courtland# procstat -kka | grep zfs 0 100140 kernel zfsvfs mi_switch+0xe1 sleepq_wait+0x2c _sleep+0x237 taskqueue_thread_loop+0xf1 fork_exit+0x83 fork_trampoline+0xe 0 100424 kernel zfs_vn_rele_taskq mi_switch+0xe1 sleepq_wait+0x2c _sleep+0x237 taskqueue_thread_loop+0xf1 fork_exit+0x83 fork_trampoline+0xe 23 100119 zfskern arc_reclaim_thread mi_switch+0xe1 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a arc_reclaim_thread+0x146 fork_exit+0x83 fork_trampoline+0xe 23 100120 zfskern arc_dnlc_evicts_thr mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 arc_dnlc_evicts_thread+0x16f fork_exit+0x83 fork_trampoline+0xe 23 100122 zfskern dbuf_evict_thread mi_switch+0xe1 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a dbuf_evict_thread+0x1c8 fork_exit+0x83 fork_trampoline+0xe 23 100139 zfskern l2arc_feed_thread mi_switch+0xe1 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a l2arc_feed_thread+0x219 fork_exit+0x83 fork_trampoline+0xe 23 100405 zfskern trim zroot mi_switch+0xe1 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a trim_thread+0x11f fork_exit+0x83 fork_trampoline+0xe 23 100441 zfskern txg_thread_entermi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 txg_quiesce+0x21b txg_quiesce_thread+0x11b fork_exit+0x83 fork_trampoline+0xe 23 100442 zfskern txg_thread_entermi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 txg_sync_thread+0x13b fork_exit+0x83 fork_trampoline+0xe 23 100443 zfskern solthread 0xfff mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0xcc fork_exit+0x83 fork_trampoline+0xe 23 100444 zfskern solthread 0xfff mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0xcc fork_exit+0x83 fork_trampoline+0xe 7476 100751 postgres- mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 dmu_tx_wait+0x2eb dmu_tx_assign+0x48 zfs_freebsd_create+0x4c8 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9 kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101 7480 100527 postgres- mi_switch+0xe1 sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c VOP_LOCK1_APV+0x7e _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9 kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101 46101 100471 postgres- mi_switch+0xe1 sleepq_wait+0x2c _cv_wait+0x152 dmu_tx_wait+0x2eb dmu_tx_assign+0x48 zfs_freebsd_create+0x4c8 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9 kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101 52625 100488 perl- mi_switch+0xe1 sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e _vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77 sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101 52675 100643 csh - mi_switch+0xe1 sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e _vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77 sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101 52826 100562 ls - mi_switch+0xe1 sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e _vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77 sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101 52889 100641 bash- mi_switch+0xe1 sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e _vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77 sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101 courtland# kgdb GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD] Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU
Re: possibly silly binmiscctl question
On Mon, Mar 4, 2019 at 11:50 AM tech-lists wrote: > > Hi, > > If I give binmiscctl the magic for arm6 and then for say mips64, will > this break things? > > Let's say I'm using an amd64 box to cross-compile using poudriere > for arm6 and mips64 ports. Can I do both on the same box at the same time? > Or do I need to let's say the arm6 run to finish, then give binmiscctl > its magic strings for mips64, and THEN run the build run for that arch? > This is what the qemu-user-static rc script does -- there are no problems. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: possibly silly binmiscctl question
On Mon, Mar 04, 2019 at 06:53:01PM +0100, Kurt Jaeger wrote: Hi! If I give binmiscctl the magic for arm6 and then for say mips64, will this break things? Let's say I'm using an amd64 box to cross-compile using poudriere for arm6 and mips64 ports. Can I do both on the same box at the same time? Or do I need to let's say the arm6 run to finish, then give binmiscctl its magic strings for mips64, and THEN run the build run for that arch? I used two archs in parallel in the past, that was no problem. oh that's great news, thanks (really I should have made sure before doing it, lol!) -- J. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: possibly silly binmiscctl question
Hi! > If I give binmiscctl the magic for arm6 and then for say mips64, will > this break things? > > Let's say I'm using an amd64 box to cross-compile using poudriere > for arm6 and mips64 ports. Can I do both on the same box at the same > time? Or do I need to let's say the arm6 run to finish, then give > binmiscctl > its magic strings for mips64, and THEN run the build run for that arch? I used two archs in parallel in the past, that was no problem. -- p...@opsec.eu+49 171 3101372One year to go ! ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
possibly silly binmiscctl question
Hi, If I give binmiscctl the magic for arm6 and then for say mips64, will this break things? Let's say I'm using an amd64 box to cross-compile using poudriere for arm6 and mips64 ports. Can I do both on the same box at the same time? Or do I need to let's say the arm6 run to finish, then give binmiscctl its magic strings for mips64, and THEN run the build run for that arch? thanks, -- J. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 12.0-RELEASE zfs/vnode deadlock issue
Thanks for the insight, it does appear that in all instances of this problem there is always one thread stuck on zfs_znode_alloc. Unfortunately its always a different application (e.g., perl, sh, postgres). I will post more information in the bug. On Sat, Mar 2, 2019 at 12:48 PM Andriy Gapon wrote: > On 01/03/2019 17:00, Nick Rogers wrote: > > 36704 101146 perl- mi_switch+0xe1 > > sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c > VOP_LOCK1_APV+0x7e > > _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d > > zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9 > > kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101 > > I suspect that this thread is a root cause of the problem. > In this place, the vnode should be freshly created and not visible to > anything > but the current thread. So, vn_lock() should always immediately succeed. > I > cannot understand how the vnode lock could be held by another thread. > > -- > Andriy Gapon > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 12.0-RELEASE zfs/vnode deadlock issue
On Sat, Mar 2, 2019 at 5:27 PM Peter Avalos via freebsd-stable < freebsd-stable@freebsd.org> wrote: > > > On Mar 1, 2019, at 7:00 AM, Nick Rogers wrote: > > > > I am hoping someone can help me figure out if this is a legitimate bug, > or > > something already fixed in 12-STABLE. I wish I could reproduce it > reliably > > to try against STABLE, but there doesn't appear to be any related ZFS > fixes > > not in RELEASE. Thanks. > > > > I have also experienced this problem, but I haven’t been able to > troubleshoot it at all. > I've opened a bug report, so if you have any more information about how it is affecting you that may be helpful to share here. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236220 > > Peter > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"