Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
On Mon, Mar 4, 2019 at 5:29 PM Andriy Gapon  wrote:

> On 04/03/2019 22:35, Nick Rogers wrote:
> > v_lock = {lock_object = {lo_name =
> > 0x8144af45 "zfs", lo_flags = 117112840, lo_data = 0, lo_witness =
> > 0x0}, lk_lock = 18446744073709551605, lk_exslpfail = 0, lk_timo = 51,
> > lk_pri = 96}
>
> Hmm, lk_lock looks bogus.
> 18446744073709551605 == 0xfff5 and it's LK_SHARE |
> LK_EXCLUSIVE_WAITERS with 0xfff shared owners.
> Perhaps, this is a result of LK_SHARERS_LOCK(-1).
>
> Is your kernel compiled with INVARIANTS and INVARIANT_SUPPORT?
> I suspect that the vnode was accessed (unlocked?) through a stale pointer
> after
> it was recycled.
>

I don't believe so - it's basically amd64 GENERIC w/ a reduced set of
modules and static zfs option.


> --
> Andriy Gapon
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Andriy Gapon
On 04/03/2019 22:35, Nick Rogers wrote:
> v_lock = {lock_object = {lo_name =
> 0x8144af45 "zfs", lo_flags = 117112840, lo_data = 0, lo_witness =
> 0x0}, lk_lock = 18446744073709551605, lk_exslpfail = 0, lk_timo = 51,
> lk_pri = 96}

Hmm, lk_lock looks bogus.
18446744073709551605 == 0xfff5 and it's LK_SHARE |
LK_EXCLUSIVE_WAITERS with 0xfff shared owners.
Perhaps, this is a result of LK_SHARERS_LOCK(-1).

Is your kernel compiled with INVARIANTS and INVARIANT_SUPPORT?
I suspect that the vnode was accessed (unlocked?) through a stale pointer after
it was recycled.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
On Sat, Mar 2, 2019 at 12:48 PM Andriy Gapon  wrote:

> On 01/03/2019 17:00, Nick Rogers wrote:
> > 36704 101146 perl-   mi_switch+0xe1
> > sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c
> VOP_LOCK1_APV+0x7e
> > _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d
> > zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
> > kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
>
> I suspect that this thread is a root cause of the problem.
> In this place, the vnode should be freshly created and not visible to
> anything
> but the current thread.  So, vn_lock() should always immediately succeed.
> I
> cannot understand how the vnode lock could be held by another thread.
>

It happened again. I tried to get a backtrace from the offending thread and
one of the others waiting for it. At the moment I have access to this
particular system in its bad state and can leave it like this for as long
as possible, so let me know if there's something else useful I can get out
of the debugger.

courtland# procstat -kka | grep zfs
0 100140 kernel  zfsvfs  mi_switch+0xe1
sleepq_wait+0x2c _sleep+0x237 taskqueue_thread_loop+0xf1 fork_exit+0x83
fork_trampoline+0xe
0 100424 kernel  zfs_vn_rele_taskq   mi_switch+0xe1
sleepq_wait+0x2c _sleep+0x237 taskqueue_thread_loop+0xf1 fork_exit+0x83
fork_trampoline+0xe
   23 100119 zfskern arc_reclaim_thread  mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a arc_reclaim_thread+0x146
fork_exit+0x83 fork_trampoline+0xe
   23 100120 zfskern arc_dnlc_evicts_thr mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 arc_dnlc_evicts_thread+0x16f fork_exit+0x83
fork_trampoline+0xe
   23 100122 zfskern dbuf_evict_thread   mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a dbuf_evict_thread+0x1c8
fork_exit+0x83 fork_trampoline+0xe
   23 100139 zfskern l2arc_feed_thread   mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a l2arc_feed_thread+0x219
fork_exit+0x83 fork_trampoline+0xe
   23 100405 zfskern trim zroot  mi_switch+0xe1
sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a trim_thread+0x11f
fork_exit+0x83 fork_trampoline+0xe
   23 100441 zfskern txg_thread_entermi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 txg_quiesce+0x21b txg_quiesce_thread+0x11b
fork_exit+0x83 fork_trampoline+0xe
   23 100442 zfskern txg_thread_entermi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 txg_sync_thread+0x13b fork_exit+0x83
fork_trampoline+0xe
   23 100443 zfskern solthread 0xfff mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0xcc fork_exit+0x83
fork_trampoline+0xe
   23 100444 zfskern solthread 0xfff mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0xcc fork_exit+0x83
fork_trampoline+0xe
 7476 100751 postgres-   mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 dmu_tx_wait+0x2eb dmu_tx_assign+0x48
zfs_freebsd_create+0x4c8 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
 7480 100527 postgres-   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d
zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
46101 100471 postgres-   mi_switch+0xe1
sleepq_wait+0x2c _cv_wait+0x152 dmu_tx_wait+0x2eb dmu_tx_assign+0x48
zfs_freebsd_create+0x4c8 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
52625 100488 perl-   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
52675 100643 csh -   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
52826 100562 ls  -   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
52889 100641 bash-   mi_switch+0xe1
sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_slock_hard+0x2c5 VOP_LOCK1_APV+0x7e
_vn_lock+0x40 zfs_root+0x6d lookup+0x933 namei+0x44b kern_statat+0x77
sys_fstatat+0x2f amd64_syscall+0x369 fast_syscall_common+0x101
courtland# kgdb
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU 

Re: possibly silly binmiscctl question

2019-03-04 Thread Kyle Evans
On Mon, Mar 4, 2019 at 11:50 AM tech-lists  wrote:
>
> Hi,
>
> If I give binmiscctl the magic for arm6 and then for say mips64, will
> this break things?
>
> Let's say I'm using an amd64 box to cross-compile using poudriere
> for arm6 and mips64 ports. Can I do both on the same box at the same time?
> Or do I need to let's say the arm6 run to finish, then give binmiscctl
> its magic strings for mips64, and THEN run the build run for that arch?
>

This is what the qemu-user-static rc script does -- there are no problems.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: possibly silly binmiscctl question

2019-03-04 Thread tech-lists

On Mon, Mar 04, 2019 at 06:53:01PM +0100, Kurt Jaeger wrote:

Hi!


If I give binmiscctl the magic for arm6 and then for say mips64, will
this break things?

Let's say I'm using an amd64 box to cross-compile using poudriere
for arm6 and mips64 ports. Can I do both on the same box at the same
time? Or do I need to let's say the arm6 run to finish, then give
binmiscctl
its magic strings for mips64, and THEN run the build run for that arch?


I used two archs in parallel in the past, that was no problem.


oh that's great news, thanks

(really I should have made sure before doing it, lol!)
--
J.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: possibly silly binmiscctl question

2019-03-04 Thread Kurt Jaeger
Hi!

> If I give binmiscctl the magic for arm6 and then for say mips64, will
> this break things?
> 
> Let's say I'm using an amd64 box to cross-compile using poudriere
> for arm6 and mips64 ports. Can I do both on the same box at the same
> time? Or do I need to let's say the arm6 run to finish, then give
> binmiscctl
> its magic strings for mips64, and THEN run the build run for that arch?

I used two archs in parallel in the past, that was no problem.

-- 
p...@opsec.eu+49 171 3101372One year to go !
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


possibly silly binmiscctl question

2019-03-04 Thread tech-lists

Hi,

If I give binmiscctl the magic for arm6 and then for say mips64, will
this break things?

Let's say I'm using an amd64 box to cross-compile using poudriere
for arm6 and mips64 ports. Can I do both on the same box at the same time? 
Or do I need to let's say the arm6 run to finish, then give binmiscctl

its magic strings for mips64, and THEN run the build run for that arch?

thanks,
--
J.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
Thanks for the insight, it does appear that in all instances of this
problem there is always one thread stuck on zfs_znode_alloc. Unfortunately
its always a different application (e.g., perl, sh, postgres). I will post
more information in the bug.

On Sat, Mar 2, 2019 at 12:48 PM Andriy Gapon  wrote:

> On 01/03/2019 17:00, Nick Rogers wrote:
> > 36704 101146 perl-   mi_switch+0xe1
> > sleepq_wait+0x2c sleeplk+0x1c5 lockmgr_xlock_hard+0x19c
> VOP_LOCK1_APV+0x7e
> > _vn_lock+0x40 zfs_znode_alloc+0x434 zfs_mknode+0xa9d
> > zfs_freebsd_create+0x512 VOP_CREATE_APV+0x78 vn_open_cred+0x2c9
> > kern_openat+0x20c amd64_syscall+0x369 fast_syscall_common+0x101
>
> I suspect that this thread is a root cause of the problem.
> In this place, the vnode should be freshly created and not visible to
> anything
> but the current thread.  So, vn_lock() should always immediately succeed.
> I
> cannot understand how the vnode lock could be held by another thread.
>
> --
> Andriy Gapon
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-04 Thread Nick Rogers
On Sat, Mar 2, 2019 at 5:27 PM Peter Avalos via freebsd-stable <
freebsd-stable@freebsd.org> wrote:

>
> > On Mar 1, 2019, at 7:00 AM, Nick Rogers  wrote:
> >
> > I am hoping someone can help me figure out if this is a legitimate bug,
> or
> > something already fixed in 12-STABLE. I wish I could reproduce it
> reliably
> > to try against STABLE, but there doesn't appear to be any related ZFS
> fixes
> > not in RELEASE. Thanks.
> >
>
> I have also experienced this problem, but I haven’t been able to
> troubleshoot it at all.
>

I've opened a bug report, so if you have any more information about how it
is affecting you that may be helpful to share here.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236220


>
> Peter
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"