Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-18 Thread Henri Hennebert



On 11/18/2016 13:30, Andriy Gapon wrote:

On 14/11/2016 14:00, Henri Hennebert wrote:

On 11/14/2016 12:45, Andriy Gapon wrote:

Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?


(kgdb) fr 4
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753
753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev =
0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0},
  pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v =
18446735281353604048}}, listq = {tqe_next = 0x0,
tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11,
phys_addr = 3389358080, md = {pv_list = {
  tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode =
6}, wire_count = 0, busy_lock = 6, hold_count = 0,
  flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0',
segind = 3 '\003', order = 13 '\r',
  pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}


If I interpret this correctly the page is in the 'exclusive busy' state.
Unfortunately, I can't tell much beyond that.
But I am confident that this is the root cause of the lock-up.


(kgdb) print *(struct faultstate *)m
$2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, first_m =
0xf800dc5d85c0,
  first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, entry
= 0x0, lookup_still_valid = -546779784,
  vp = 0x601aa}
(kgdb)


I was wrong on this one as 'm' is actually a pointer, so the above is not
correct.  Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'.


(kgdb) fr 5
#5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, 
msg=0x809c51bc "vmpfw")

at /usr/src/sys/vm/vm_page.c:1086
1086vm_page_busy_sleep(m, msg);
(kgdb) info reg
rax0x0  0
rbx0xf800b62e9c78   -8793036514184
rcx0x0  0
rdx0x0  0
rsi0x0  0
rdi0x0  0
rbp0xfe0101836810   0xfe0101836810
rsp0xfe01018367e0   0xfe01018367e0
r8 0x0  0
r9 0x0  0
r100x0  0
r110x0  0
r120xf800b642aa00   -879303520
r130xf800df68cd40   -8792344834752
r140xf800b62e9c60   -8793036514208
r150x809c51bc   -2137239108
rip0x8089dd4d	0x8089dd4d 


eflags 0x0  0
cs 0x0  0
ss 0x0  0
ds 0x0  0
es 0x0  0
fs 0x0  0
gs 0x0  0

I don't know what to do from here.


I am not sure how to proceed from here.
The only thing I can think of is a lock order reversal between the vnode lock
and the page busying quasi-lock.  But examining the code I can not spot it.
Another possibility is a leak of a busy page, but that's hard to debug.

How hard is it to reproduce the problem?


After 7 days all seems normal only one copy of innd:

[root@avoriaz ~]# ps xa|grep inn
 1193  -  Is   0:01.40 /usr/local/news/bin/innd -r
13498  -  IN   0:00.01 /usr/local/news/bin/innfeed
 1194 v0- IW   0:00.00 /bin/sh /usr/local/news/bin/innwatch -i 60

I will try to stop and restart innd.

All continue to look good:

[root@avoriaz ~]# ps xa|grep inn
31673  -  Ss   0:00.02 /usr/local/news/bin/innd
31694  -  SN   0:00.01 /usr/local/news/bin/innfeed
31674  0  S0:00.01 /bin/sh /usr/local/news/bin/innwatch -i 60


I think to reproduce is just waiting it occurs by itself...

One thing here: The deadlock occurs at least 5 times since 10.0R. And 
always with the directory /usr/local/news/bin




Maybe Konstantin would have some ideas or suggestions.



Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-18 Thread Andriy Gapon
On 14/11/2016 14:00, Henri Hennebert wrote:
> On 11/14/2016 12:45, Andriy Gapon wrote:
>> Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
>> happens to be the first field of 'struct faultstate'.  So, could you please 
>> go
>> to frame and print '*m' and '*(struct faultstate *)m' ?
>>
> (kgdb) fr 4
> #4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, 
> wmesg= optimized out>) at /usr/src/sys/vm/vm_page.c:753
> 753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
> (kgdb) print *m
> $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev =
> 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0},
>   pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v =
> 18446735281353604048}}, listq = {tqe_next = 0x0,
> tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11,
> phys_addr = 3389358080, md = {pv_list = {
>   tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode 
> =
> 6}, wire_count = 0, busy_lock = 6, hold_count = 0,
>   flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 
> '\0',
> segind = 3 '\003', order = 13 '\r',
>   pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}

If I interpret this correctly the page is in the 'exclusive busy' state.
Unfortunately, I can't tell much beyond that.
But I am confident that this is the root cause of the lock-up.

> (kgdb) print *(struct faultstate *)m
> $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, 
> first_m =
> 0xf800dc5d85c0,
>   first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, 
> entry
> = 0x0, lookup_still_valid = -546779784,
>   vp = 0x601aa}
> (kgdb)

I was wrong on this one as 'm' is actually a pointer, so the above is not
correct.  Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'.

I am not sure how to proceed from here.
The only thing I can think of is a lock order reversal between the vnode lock
and the page busying quasi-lock.  But examining the code I can not spot it.
Another possibility is a leak of a busy page, but that's hard to debug.

How hard is it to reproduce the problem?

Maybe Konstantin would have some ideas or suggestions.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Henri Hennebert



On 11/14/2016 12:45, Andriy Gapon wrote:

On 14/11/2016 11:35, Henri Hennebert wrote:



On 11/14/2016 10:07, Andriy Gapon wrote:

Hmm, I've just noticed another interesting thread:
Thread 668 (Thread 101245):
#0  sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc
"vmpfw", sbt=0, pr=, flags=) at
/usr/src/sys/kern/kern_synch.c:229
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753
#5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40,
msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
#6  0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
/usr/src/sys/vm/vm_fault.c:495
#7  0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at
/usr/src/sys/vm/vm_fault.c:273
#8  0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at
/usr/src/sys/amd64/amd64/trap.c:741
#9  0x808d3386 in trap (frame=0xfe0101836c00) at
/usr/src/sys/amd64/amd64/trap.c:333
#10 0x808b7af1 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:236


This tread is another program from the news system:
668 Thread 101245 (PID=49124: innfeed)  sched_switch (td=0xf800b642aa00,
newtd=0xf8000285f000, flags=) at
/usr/src/sys/kern/sched_ule.c:1973



I strongly suspect that this is thread that we were looking for.
I think that it has the vnode lock in the shared mode while trying to fault in a
page.



--clip--



Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?


(kgdb) fr 4
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, 
wmesg=) at /usr/src/sys/vm/vm_page.c:753

753 msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = 
0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0},
  pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v 
= 18446735281353604048}}, listq = {tqe_next = 0x0,
tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex 
= 11, phys_addr = 3389358080, md = {pv_list = {
  tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, 
pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0,
  flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind 
= 0 '\0', segind = 3 '\003', order = 13 '\r',

  pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}
(kgdb) print *(struct faultstate *)m
$2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, 
first_m = 0xf800dc5d85c0,
  first_object = 0xf800b62e9c60, first_pindex = 11, map = 
0xca058000, entry = 0x0, lookup_still_valid = -546779784,

  vp = 0x601aa}
(kgdb)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Andriy Gapon
On 14/11/2016 11:35, Henri Hennebert wrote:
> 
> 
> On 11/14/2016 10:07, Andriy Gapon wrote:
>> Hmm, I've just noticed another interesting thread:
>> Thread 668 (Thread 101245):
>> #0  sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, 
>> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0x80561ae2 in mi_switch (flags=, newtd=0x0) 
>> at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0x805614b1 in _sleep (ident=, lock=> optimized out>, priority=, wmesg=0x809c51bc
>> "vmpfw", sbt=0, pr=, flags=) at
>> /usr/src/sys/kern/kern_synch.c:229
>> #4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, 
>> wmesg=> optimized out>) at /usr/src/sys/vm/vm_page.c:753
>> #5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40,
>> msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
>> #6  0x80886be9 in vm_fault_hold (map=, 
>> vaddr=> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
>> /usr/src/sys/vm/vm_fault.c:495
>> #7  0x80885448 in vm_fault (map=0xf80011d66000, vaddr=> optimized out>, fault_type=4 '\004', fault_flags=) at
>> /usr/src/sys/vm/vm_fault.c:273
>> #8  0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) 
>> at
>> /usr/src/sys/amd64/amd64/trap.c:741
>> #9  0x808d3386 in trap (frame=0xfe0101836c00) at
>> /usr/src/sys/amd64/amd64/trap.c:333
>> #10 0x808b7af1 in calltrap () at 
>> /usr/src/sys/amd64/amd64/exception.S:236
> 
> This tread is another program from the news system:
> 668 Thread 101245 (PID=49124: innfeed)  sched_switch (td=0xf800b642aa00,
> newtd=0xf8000285f000, flags=) at
> /usr/src/sys/kern/sched_ule.c:1973
> 
>>
>> I strongly suspect that this is thread that we were looking for.
>> I think that it has the vnode lock in the shared mode while trying to fault 
>> in a
>> page.
>>
>> Could you please check that by going to frame 6 and printing 'fs' and 
>> '*fs.vp'?
>> It'd be interesting to understand why this thread is waiting here.
>> So, please also print '*fs.m' and '*fs.object'.
> 
> No luck :-(
> (kgdb) fr 6
> #6  0x80886be9 in vm_fault_hold (map=, 
> vaddr= optimized out>, fault_type=4 '\004',
> fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495
> 495vm_page_sleep_if_busy(fs.m, "vmpfw");
> (kgdb) print fs
> Cannot access memory at address 0x1fa0
> (kgdb)

Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Henri Hennebert



On 11/14/2016 10:07, Andriy Gapon wrote:

On 13/11/2016 15:28, Henri Hennebert wrote:

On 11/13/2016 11:06, Andriy Gapon wrote:

On 12/11/2016 14:40, Henri Hennebert wrote:



[snip]

Could you please show 'info local' in frame 14?
I expected that 'nd' variable would be defined there and it may contain some
useful information.


No luck there:

(kgdb) fr 14
#14 0x80636838 in kern_statat (td=0xf80009ba0500, 
flag=, fd=-100, path=0x0,
pathseg=, sbp=, 
hook=0x800e2a388) at /usr/src/sys/kern/vfs_syscalls.c:2160

2160if ((error = namei()) != 0)
(kgdb) info local
rights = 
nd = 
error = 
sb = 
(kgdb)



I also try to get information from the execve of the other treads:

for tid 101250:
(kgdb) fr 10
#10 0x80508ccc in sys_execve (td=0xf800b6429000,
uap=0xfe010184fb80) at /usr/src/sys/kern/kern_exec.c:218
218error = kern_execve(td, , NULL);
(kgdb) print *uap
$4 = {fname_l_ = 0xfe010184fb80 "`\220\217\002\b", fname = 0x8028f9060
,
  fname_r_ = 0xfe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010184fb88
"`¶ÿÿÿ\177", argv = 0x7fffb660,
  argv_r_ = 0xfe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010184fb90
"\bÜÿÿÿ\177", envv = 0x7fffdc08,
  envv_r_ = 0xfe010184fb98 ""}
(kgdb)

for tid 101243:

(kgdb) f 15
#15 0x80508ccc in sys_execve (td=0xf800b642b500,
uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
218error = kern_execve(td, , NULL);
(kgdb) print *uap
$5 = {fname_l_ = 0xfe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0 ,
  fname_r_ = 0xfe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010182cb88
"`¶ÿÿÿ\177", argv = 0x7fffb660,
  argv_r_ = 0xfe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010182cb90
"\bÜÿÿÿ\177", envv = 0x7fffdc08,
  envv_r_ = 0xfe010182cb98 ""}
(kgdb)


I think that you see garbage in those structures because they contain pointers
to userland data.

Hmm, I've just noticed another interesting thread:
Thread 668 (Thread 101245):
#0  sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc
"vmpfw", sbt=0, pr=, flags=) at
/usr/src/sys/kern/kern_synch.c:229
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753
#5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40,
msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
#6  0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
/usr/src/sys/vm/vm_fault.c:495
#7  0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at
/usr/src/sys/vm/vm_fault.c:273
#8  0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at
/usr/src/sys/amd64/amd64/trap.c:741
#9  0x808d3386 in trap (frame=0xfe0101836c00) at
/usr/src/sys/amd64/amd64/trap.c:333
#10 0x808b7af1 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:236


This tread is another program from the news system:
668 Thread 101245 (PID=49124: innfeed)  sched_switch 
(td=0xf800b642aa00, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973




I strongly suspect that this is thread that we were looking for.
I think that it has the vnode lock in the shared mode while trying to fault in a
page.

Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'?
It'd be interesting to understand why this thread is waiting here.
So, please also print '*fs.m' and '*fs.object'.


No luck :-(
(kgdb) fr 6
#6  0x80886be9 in vm_fault_hold (map=, 
vaddr=, fault_type=4 '\004',

fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495
495 vm_page_sleep_if_busy(fs.m, 
"vmpfw");
(kgdb) print fs
Cannot access memory at address 0x1fa0
(kgdb)

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Andriy Gapon
On 13/11/2016 15:28, Henri Hennebert wrote:
> On 11/13/2016 11:06, Andriy Gapon wrote:
>> On 12/11/2016 14:40, Henri Hennebert wrote:
>>> I attatch it
>>
>> Thank you!
>> So, these two threads are trying to get the lock in the exclusive mode:
>> Thread 687 (Thread 101243):
>> #0  sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, 
>> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0x80561ae2 in mi_switch (flags=, newtd=0x0) 
>> at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs",
>> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
>> #4  0x8052f39d in __lockmgr_args (lk=, 
>> flags=> optimized out>, ilk=, wmesg=,
>> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958
>> #5  0x80616a8c in vop_stdlock (ap=) at 
>> lockmgr.h:98
>> #6  0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087
>> #7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
>> file=, line=) at vnode_if.h:859
>> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
>> td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
>> #9  0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=,
>> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
>> #10 0x806133dc in vfs_cache_lookup (ap=) at
>> /usr/src/sys/kern/vfs_cache.c:1081
>> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127
>> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
>> #13 0x8061c492 in namei (ndp=) at
>> /usr/src/sys/kern/vfs_lookup.c:306
>> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
>> #15 0x80508ccc in sys_execve (td=0xf800b642b500,
>> uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
>> #16 0x808d449e in amd64_syscall (td=, traced=0) 
>> at
>> subr_syscall.c:135
>> #17 0x808b7ddb in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
>>
>> Thread 681 (Thread 101147):
>> #0  sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, 
>> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0x80561ae2 in mi_switch (flags=, newtd=0x0) 
>> at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs",
>> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
>> #4  0x8052f39d in __lockmgr_args (lk=, 
>> flags=> optimized out>, ilk=, wmesg=,
>> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958
>> #5  0x80616a8c in vop_stdlock (ap=) at 
>> lockmgr.h:98
>> #6  0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087
>> #7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
>> file=, line=) at vnode_if.h:859
>> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
>> td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
>> #9  0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=,
>> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
>> #10 0x806133dc in vfs_cache_lookup (ap=) at
>> /usr/src/sys/kern/vfs_cache.c:1081
>> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127
>> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
>> #13 0x8061c492 in namei (ndp=) at
>> /usr/src/sys/kern/vfs_lookup.c:306
>> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
>> #15 0x80508ccc in sys_execve (td=0xf80065f4e500,
>> uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
>> #16 0x808d449e in amd64_syscall (td=, traced=0) 
>> at
>> subr_syscall.c:135
>> #17 0x808b7ddb in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
> 
> This 2 threads are innd processes. In core.txt.4:
> 
>8 14789 29165   0   24  4   40040   6612 zfs  DN- 0:00.00 [innd]
>8 29165 1   0   20  0   42496   6888 select   Ds- 0:01.33 [innd]
>8 49778 29165   0   24  4   40040   6900 zfs  DN- 0:00.00 [innd]
>8 82034 29165   0   24  4 132  0 zfs  DN- 0:00.00 [innd]
> 
> the corresponding info treads are:
> 
>   687 Thread 101243 (PID=49778: innd)  sched_switch (td=0xf800b642b500,
> newtd=0xf8000285ea00, flags=) at
> /usr/src/sys/kern/sched_ule.c:1973
>   681 Thread 101147 (PID=14789: innd)  sched_switch (td=0xf80065f4e500,
> newtd=0xf8000285f000, flags=) at
> /usr/src/sys/kern/sched_ule.c:1973
>   669 Thread 101250 (PID=82034: innd)  sched_switch (td=0xf800b6429000,
> 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-13 Thread Henri Hennebert

On 11/13/2016 14:28, Henri Hennebert wrote:

This 2 threads are innd processes. In core.txt.4:

   8 14789 29165   0   24  4   40040   6612 zfs  DN- 0:00.00 [innd]
   8 29165 1   0   20  0   42496   6888 select   Ds- 0:01.33 [innd]
   8 49778 29165   0   24  4   40040   6900 zfs  DN- 0:00.00 [innd]
   8 82034 29165   0   24  4 132  0 zfs  DN- 0:00.00 [innd]

the corresponding info treads are:

  687 Thread 101243 (PID=49778: innd)  sched_switch
(td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
  681 Thread 101147 (PID=14789: innd)  sched_switch
(td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
  669 Thread 101250 (PID=82034: innd)  sched_switch
(td=0xf800b6429000, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
  665 Thread 101262 (PID=29165: innd)  sched_switch
(td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973


In case it may help, I have a look at innd. This processes use 2 execv:

one to execute /bin/sh and the other to execute itself:

/*
**  Re-exec ourselves.
*/
static const char *
CCxexec(char *av[])
{
char*innd;
char*p;
int i;

if (CCargv == NULL)
return "1 no argv!";

innd = concatpath(innconf->pathbin, "innd");
/* Get the pathname. */
p = av[0];
if (*p == '\0' || strcmp(p, "innd") == 0)
CCargv[0] = innd;
else
return "1 Bad value";

#ifdef DO_PERL
PLmode(Mode, OMshutdown, av[0]);
#endif
#ifdef DO_PYTHON
PYmode(Mode, OMshutdown, av[0]);
#endif
JustCleanup();
syslog(L_NOTICE, "%s execv %s", LogName, CCargv[0]);

/* Close all fds to protect possible fd leaking accross successive 
innds. */

for (i=3; i<30; i++)
close(i);

execv(CCargv[0], CCargv);
syslog(L_FATAL, "%s cant execv %s %m", LogName, CCargv[0]);
_exit(1);
/* NOTREACHED */
return "1 Exit failed";
}

The culprit may be /usr/local/news/bin/innd,

remember that find is locked in /usr/local/news/bin

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-13 Thread Henri Hennebert

On 11/13/2016 11:06, Andriy Gapon wrote:

On 12/11/2016 14:40, Henri Hennebert wrote:

I attatch it


Thank you!
So, these two threads are trying to get the lock in the exclusive mode:
Thread 687 (Thread 101243):
#0  sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf800b642b500,
uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

Thread 681 (Thread 101147):
#0  sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf80065f4e500,
uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396


This 2 threads are innd processes. In core.txt.4:

   8 14789 29165   0   24  4   40040   6612 zfs  DN- 
0:00.00 [innd]
   8 29165 1   0   20  0   42496   6888 select   Ds- 
0:01.33 [innd]
   8 49778 29165   0   24  4   40040   6900 zfs  DN- 
0:00.00 [innd]
   8 82034 29165   0   24  4 132  0 zfs  DN- 
0:00.00 [innd]


the corresponding info treads are:

  687 Thread 101243 (PID=49778: innd)  sched_switch 
(td=0xf800b642b500, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973
  681 Thread 101147 (PID=14789: innd)  sched_switch 
(td=0xf80065f4e500, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973
  669 Thread 101250 (PID=82034: innd)  sched_switch 
(td=0xf800b6429000, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973
  665 Thread 101262 (PID=29165: innd)  sched_switch 
(td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973


So your missing tread must be 101250:

(kgdb) tid 101250
[Switching to thread 669 (Thread 101250)]#0  sched_switch 
(td=0xf800b6429000, newtd=0xf8000285ea00,

flags=) at /usr/src/sys/kern/sched_ule.c:1973
1973cpuid = PCPU_GET(cpuid);
Current language:  auto; currently minimal
(kgdb) bt

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-13 Thread Andriy Gapon
On 12/11/2016 14:40, Henri Hennebert wrote:
> I attatch it

Thank you!
So, these two threads are trying to get the lock in the exclusive mode:
Thread 687 (Thread 101243):
#0  sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf800b642b500,
uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

Thread 681 (Thread 101147):
#0  sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf80065f4e500,
uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

And the original stuck thread wants to get the lock in the shared mode.
And there should be another thread that already holds the lock in the shared
mode.  But I am not able to identify it.  I wonder if the original thread could
be trying to get the lock recursively...

It would be interesting to get more details from thread 101112.
You can switch to it using tid command, you can use 'fr' to select frames, 'info
local' and 'info args' to see what variables are available (not optimized out)
and the you can print any that look interesting.  It would be nice to get a file
path and a directory vnode where the lookup is called.

Thank you.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-12 Thread Andriy Gapon
On 11/11/2016 16:50, Henri Hennebert wrote:
> 
> 
> On 11/11/2016 12:24, Andriy Gapon wrote:
>>
>> At this stage I would try to get a system crash dump for post-mortem 
>> analysis.
>> There are a few way to do that.  You can enter ddb and then run 'dump' and
>> 'reset' commands.  Or you can just do `sysctl debug.kdb.panic=1`.
>> In either case, please double-check that your system has a dump device
>> configured.
>>
> It take some time to upload the dump...
> 
> You can find it at
> 
> http://tignes.restart.be/Xfer/

Could you please open the dump in kgdb and execute the following commands?

set logging on
set logging redirect on
set pagination off
thread apply all bt
quit

After that you should get gdb.txt file in the current directory.
I would like to see it.

Thank you.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-11 Thread Henri Hennebert



On 11/11/2016 12:24, Andriy Gapon wrote:


At this stage I would try to get a system crash dump for post-mortem analysis.
There are a few way to do that.  You can enter ddb and then run 'dump' and
'reset' commands.  Or you can just do `sysctl debug.kdb.panic=1`.
In either case, please double-check that your system has a dump device 
configured.


It take some time to upload the dump...

You can find it at

http://tignes.restart.be/Xfer/

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-11 Thread Andriy Gapon
On 10/11/2016 21:41, Henri Hennebert wrote:
> On 11/10/2016 19:40, Andriy Gapon wrote:
>> On 10/11/2016 19:55, Henri Hennebert wrote:
>>>
>>>
>>> On 11/10/2016 18:33, Andriy Gapon wrote:
 On 10/11/2016 18:12, Henri Hennebert wrote:
> On 11/10/2016 16:54, Andriy Gapon wrote:
>> On 10/11/2016 17:20, Henri Hennebert wrote:
>>> On 11/10/2016 15:00, Andriy Gapon wrote:
 Interesting.  I can not spot any suspicious thread that would hold the
 vnode
 lock.  Could you please run kgdb (just like that, no arguments), then
 execute
 'bt' command and then select a frame when _vn_lock is called with 'fr 
 N'
 command.  Then please 'print *vp' and share the result.

>>> I Think I miss something in your request:
>>
>> Oh, sorry!  The very first step should be 'tid 101112' to switch to the
>> correct
>> context.
>>
>
> (kgdb) fr 7
> #7  0x8063c5b3 in _vn_lock (vp=, 
> flags=2121728,

 "value optimized out" - not good

> file=,
> line=) at vnode_if.h:859
> 859vnode_if.h: No such file or directory.
> in vnode_if.h
> (kgdb) print *vp

 I am not sure if this output is valid, because of the message above.
 Could you please try to navigate to nearby frames and see if vp itself has 
 a
 valid value there.  If you can find such a frame please do *vp  there.

>>>
>>> Does this seems better?
>>
>> Yes!
>>
>>> (kgdb) fr 8
>>> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728,
>>> td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
>>> 2523if ((error = vn_lock(vp, flags)) != 0) {
>>> (kgdb) print *vp
>>> $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data =
>>> 0xf80049c1f420, v_mount = 0xf800093aa660,
>>>   v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 
>>> 0xf80049c2bb30},
>>> v_un = {vu_mount = 0x0, vu_socket = 0x0,
>>> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev 
>>> =
>>> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
>>> tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, 
>>> v_cache_dd =
>>> 0x0, v_lock = {lock_object = {
>>>   lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0,
>>> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
>>> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
>>> 0x8099e9e0 "vnode interlock", lo_flags = 16973824,
>>>   lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
>>> 0xf80049c2c068, v_actfreelist = {
>>> tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj 
>>> =
>>> {bo_lock = {lock_object = {
>>> lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 
>>> 86179840,
>>> lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
>>> bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, 
>>> bo_synclist =
>>> {le_next = 0x0, le_prev = 0x0},
>>> bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, 
>>> bo_clean =
>>> {bv_hd = {tqh_first = 0x0,
>>> tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 
>>> 0},
>>> bo_dirty = {bv_hd = {tqh_first = 0x0,
>>> tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 
>>> 0},
>>> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
>>>   v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
>>> {tqh_first = 0x0, tqh_last = 0xf80049c2c188},
>>> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
>>> v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
>>>   v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
>>> (kgdb)
>>
>> flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
>> lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | 
>> LK_SHARED_WAITERS |
>> LK_SHARE
>>
>> So, here's what we have here: this thread tries to get a shared lock on the
>> vnode, the vnode is already locked in shared mode, but there is an exclusive
>> waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
>> because of the exclusive waiter.  And I do not see an easy way to identify 
>> that
>> waiter.
>>
>> In the procstat output that you provided earlier there was no other thread in
>> vn_lock.  Hmm, I see this:
>> procstat: sysctl: kern.proc.kstack: 14789: Device busy
>> procstat: sysctl: kern.proc.kstack: 82034: Device busy
>>
>> Could you please check what those two processes are (if they are still 
>> running)?
>> Perhaps try procstat for each of the pids several times.
>>

At this stage I would try to get a system crash dump for post-mortem analysis.
There are a few way to do that.  You can enter ddb and then run 'dump' and
'reset' commands.  Or you can just do `sysctl debug.kdb.panic=1`.
In either case, please double-check that your system has a dump device 
configured.

> This 2 processes are the 2 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 19:40, Andriy Gapon wrote:

On 10/11/2016 19:55, Henri Hennebert wrote:



On 11/10/2016 18:33, Andriy Gapon wrote:

On 10/11/2016 18:12, Henri Hennebert wrote:

On 11/10/2016 16:54, Andriy Gapon wrote:

On 10/11/2016 17:20, Henri Hennebert wrote:

On 11/10/2016 15:00, Andriy Gapon wrote:

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:


Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.



(kgdb) fr 7
#7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,


"value optimized out" - not good


file=,
line=) at vnode_if.h:859
859vnode_if.h: No such file or directory.
in vnode_if.h
(kgdb) print *vp


I am not sure if this output is valid, because of the message above.
Could you please try to navigate to nearby frames and see if vp itself has a
valid value there.  If you can find such a frame please do *vp  there.



Does this seems better?


Yes!


(kgdb) fr 8
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728,
td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
2523if ((error = vn_lock(vp, flags)) != 0) {
(kgdb) print *vp
$1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data =
0xf80049c1f420, v_mount = 0xf800093aa660,
  v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049c2bb30},
v_un = {vu_mount = 0x0, vu_socket = 0x0,
vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev =
0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, v_cache_dd =
0x0, v_lock = {lock_object = {
  lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0,
lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
0x8099e9e0 "vnode interlock", lo_flags = 16973824,
  lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
0xf80049c2c068, v_actfreelist = {
tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj =
{bo_lock = {lock_object = {
lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840,
lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist =
{le_next = 0x0, le_prev = 0x0},
bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, bo_clean =
{bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0},
bo_dirty = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0},
bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
  v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
{tqh_first = 0x0, tqh_last = 0xf80049c2c188},
rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
  v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
(kgdb)


flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS |
LK_SHARE

So, here's what we have here: this thread tries to get a shared lock on the
vnode, the vnode is already locked in shared mode, but there is an exclusive
waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
because of the exclusive waiter.  And I do not see an easy way to identify that
waiter.

In the procstat output that you provided earlier there was no other thread in
vn_lock.  Hmm, I see this:
procstat: sysctl: kern.proc.kstack: 14789: Device busy
procstat: sysctl: kern.proc.kstack: 82034: Device busy

Could you please check what those two processes are (if they are still running)?
Perhaps try procstat for each of the pids several times.



This 2 processes are the 2 instances of the innd daemon (news server) 
which seems in accordance with the directory /usr/local/news/bin.


[root@avoriaz ~]# procstat 14789
  PID  PPID  PGID   SID  TSID THR LOGINWCHAN EMUL  COMM
14789 29165 29165 29165 0   1 root zfs   FreeBSD ELF64 innd
[root@avoriaz ~]# procstat 82034
  PID  PPID  PGID   SID  TSID THR LOGINWCHAN EMUL  COMM
82034 29165 29165 29165 0   1 root zfs   FreeBSD ELF64 innd
[root@avoriaz ~]# procstat -f 14789
procstat: kinfo_getfile(): Device busy
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
[root@avoriaz ~]# procstat -f 14789
procstat: kinfo_getfile(): Device busy
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
[root@avoriaz ~]# procstat -f 14789
procstat: kinfo_getfile(): Device busy
  PID COMMFD T V FLAGS 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 19:55, Henri Hennebert wrote:
> 
> 
> On 11/10/2016 18:33, Andriy Gapon wrote:
>> On 10/11/2016 18:12, Henri Hennebert wrote:
>>> On 11/10/2016 16:54, Andriy Gapon wrote:
 On 10/11/2016 17:20, Henri Hennebert wrote:
> On 11/10/2016 15:00, Andriy Gapon wrote:
>> Interesting.  I can not spot any suspicious thread that would hold the 
>> vnode
>> lock.  Could you please run kgdb (just like that, no arguments), then 
>> execute
>> 'bt' command and then select a frame when _vn_lock is called with 'fr N'
>> command.  Then please 'print *vp' and share the result.
>>
> I Think I miss something in your request:

 Oh, sorry!  The very first step should be 'tid 101112' to switch to the 
 correct
 context.

>>>
>>> (kgdb) fr 7
>>> #7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,
>>
>> "value optimized out" - not good
>>
>>> file=,
>>> line=) at vnode_if.h:859
>>> 859vnode_if.h: No such file or directory.
>>> in vnode_if.h
>>> (kgdb) print *vp
>>
>> I am not sure if this output is valid, because of the message above.
>> Could you please try to navigate to nearby frames and see if vp itself has a
>> valid value there.  If you can find such a frame please do *vp  there.
>>
> 
> Does this seems better?

Yes!

> (kgdb) fr 8
> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728,
> td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
> 2523if ((error = vn_lock(vp, flags)) != 0) {
> (kgdb) print *vp
> $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data =
> 0xf80049c1f420, v_mount = 0xf800093aa660,
>   v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 
> 0xf80049c2bb30},
> v_un = {vu_mount = 0x0, vu_socket = 0x0,
> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev =
> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
> tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, 
> v_cache_dd =
> 0x0, v_lock = {lock_object = {
>   lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0,
> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
> 0x8099e9e0 "vnode interlock", lo_flags = 16973824,
>   lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
> 0xf80049c2c068, v_actfreelist = {
> tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj =
> {bo_lock = {lock_object = {
> lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840,
> lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
> bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist =
> {le_next = 0x0, le_prev = 0x0},
> bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, 
> bo_clean =
> {bv_hd = {tqh_first = 0x0,
> tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0},
> bo_dirty = {bv_hd = {tqh_first = 0x0,
> tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0},
> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
>   v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
> {tqh_first = 0x0, tqh_last = 0xf80049c2c188},
> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
> v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
>   v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
> (kgdb)

flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS |
LK_SHARE

So, here's what we have here: this thread tries to get a shared lock on the
vnode, the vnode is already locked in shared mode, but there is an exclusive
waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
because of the exclusive waiter.  And I do not see an easy way to identify that
waiter.

In the procstat output that you provided earlier there was no other thread in
vn_lock.  Hmm, I see this:
procstat: sysctl: kern.proc.kstack: 14789: Device busy
procstat: sysctl: kern.proc.kstack: 82034: Device busy

Could you please check what those two processes are (if they are still running)?
Perhaps try procstat for each of the pids several times.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert



On 11/10/2016 18:33, Andriy Gapon wrote:

On 10/11/2016 18:12, Henri Hennebert wrote:

On 11/10/2016 16:54, Andriy Gapon wrote:

On 10/11/2016 17:20, Henri Hennebert wrote:

On 11/10/2016 15:00, Andriy Gapon wrote:

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:


Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.



(kgdb) fr 7
#7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,


"value optimized out" - not good


file=,
line=) at vnode_if.h:859
859vnode_if.h: No such file or directory.
in vnode_if.h
(kgdb) print *vp


I am not sure if this output is valid, because of the message above.
Could you please try to navigate to nearby frames and see if vp itself has a
valid value there.  If you can find such a frame please do *vp  there.



Does this seems better?

(kgdb) fr 8
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, 
td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523

2523if ((error = vn_lock(vp, flags)) != 0) {
(kgdb) print *vp
$1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, 
v_data = 0xf80049c1f420, v_mount = 0xf800093aa660,
  v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 
0xf80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0,
vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, 
le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, 
v_cache_dd = 0x0, v_lock = {lock_object = {
  lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data 
= 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 
0x8099e9e0 "vnode interlock", lo_flags = 16973824,
  lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 
0xf80049c2c068, v_actfreelist = {
tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, 
v_bufobj = {bo_lock = {lock_object = {
lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 
86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, 
bo_synclist = {le_next = 0x0, le_prev = 0x0},
bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, 
bo_clean = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt 
= 0}, bo_dirty = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt 
= 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
  v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = 
{tqh_first = 0x0, tqh_last = 0xf80049c2c188},
rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 
0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512,

  v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
(kgdb)

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 18:12, Henri Hennebert wrote:
> On 11/10/2016 16:54, Andriy Gapon wrote:
>> On 10/11/2016 17:20, Henri Hennebert wrote:
>>> On 11/10/2016 15:00, Andriy Gapon wrote:
 Interesting.  I can not spot any suspicious thread that would hold the 
 vnode
 lock.  Could you please run kgdb (just like that, no arguments), then 
 execute
 'bt' command and then select a frame when _vn_lock is called with 'fr N'
 command.  Then please 'print *vp' and share the result.

>>> I Think I miss something in your request:
>>
>> Oh, sorry!  The very first step should be 'tid 101112' to switch to the 
>> correct
>> context.
>>
> 
> (kgdb) fr 7
> #7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,

"value optimized out" - not good

> file=,
> line=) at vnode_if.h:859
> 859vnode_if.h: No such file or directory.
> in vnode_if.h
> (kgdb) print *vp

I am not sure if this output is valid, because of the message above.
Could you please try to navigate to nearby frames and see if vp itself has a
valid value there.  If you can find such a frame please do *vp  there.

> $1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 0xf80009a41000,
> v_data = 0x0, v_mount = 0xf80009a41010,
>   v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un =
> {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0,
> vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, le_prev =
> 0x0}, v_cache_src = {lh_first = 0xfe010186d768},
>   v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, v_cache_dd =
> 0xf8000284f000, v_lock = {lock_object = {
>   lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, lo_witness =
> 0xf800068bd480},
> lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = -2048,
> lk_pri = 0}, v_interlock = {lock_object = {
>   lo_name = 0x18af8  address>, lo_flags = 0, lo_data = 0,
>   lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist =
> {tqe_next = 0x0, tqe_prev = 0xf80009ba05c0},
>   v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 "",
> lo_flags = 1, lo_data = 0, lo_witness = 0x400ff},
>   rw_lock = 2}, bo_ops = 0x1, bo_object = 0xf80049c2c068,
> bo_synclist = {le_next = 0x813be535,
>   le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, 
> bo_clean =
> {bv_hd = {tqh_first = 0x0, tqh_last = 0x0},
>   bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first =
> 0xf80088ac8d00, tqh_last = 0xf8003cc5b600},
>   bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, bo_numoutput =
> 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0,
>   v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0xf88,
> tqh_last = 0x19cc}, rl_currdep = 0x3f8},
>   v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 0,
> v_usecount = 2369, v_iflag = 0, v_vflag = 0,
>   v_writecount = 0, v_hash = 0, v_type = VNON}
> (kgdb)
> 
> Thanks for your time
> 
> Henri


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 16:54, Andriy Gapon wrote:

On 10/11/2016 17:20, Henri Hennebert wrote:

On 11/10/2016 15:00, Andriy Gapon wrote:

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:


Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.



(kgdb) fr 7
#7  0x8063c5b3 in _vn_lock (vp=, 
flags=2121728, file=,

line=) at vnode_if.h:859
859 vnode_if.h: No such file or directory.
in vnode_if.h
(kgdb) print *vp
$1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 
0xf80009a41000, v_data = 0x0, v_mount = 0xf80009a41010,
  v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un 
= {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0,
vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, 
le_prev = 0x0}, v_cache_src = {lh_first = 0xfe010186d768},
  v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, 
v_cache_dd = 0xf8000284f000, v_lock = {lock_object = {
  lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, 
lo_witness = 0xf800068bd480},
lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = 
-2048, lk_pri = 0}, v_interlock = {lock_object = {
  lo_name = 0x18af8 Bad address>, lo_flags = 0, lo_data = 0,
  lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist = 
{tqe_next = 0x0, tqe_prev = 0xf80009ba05c0},
  v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 
"", lo_flags = 1, lo_data = 0, lo_witness = 0x400ff},
  rw_lock = 2}, bo_ops = 0x1, bo_object = 
0xf80049c2c068, bo_synclist = {le_next = 0x813be535,
  le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, 
bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0x0},
  bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = 
{tqh_first = 0xf80088ac8d00, tqh_last = 0xf8003cc5b600},
  bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, 
bo_numoutput = 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0,
  v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 
0xf88, tqh_last = 0x19cc}, rl_currdep = 0x3f8},
  v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 
0, v_usecount = 2369, v_iflag = 0, v_vflag = 0,

  v_writecount = 0, v_hash = 0, v_type = VNON}
(kgdb)

Thanks for your time

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 17:20, Henri Hennebert wrote:
> On 11/10/2016 15:00, Andriy Gapon wrote:
>> Interesting.  I can not spot any suspicious thread that would hold the vnode
>> lock.  Could you please run kgdb (just like that, no arguments), then execute
>> 'bt' command and then select a frame when _vn_lock is called with 'fr N'
>> command.  Then please 'print *vp' and share the result.
>>
> I Think I miss something in your request:

Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.

> [root@avoriaz ~]# kgdb
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
> done.
> 
> --- clip ---
> 
> Loaded symbols for /boot/kernel/accf_data.ko
> Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/daemon_saver.ko
> #0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
> flags= optimized out>)
> at /usr/src/sys/kern/sched_ule.c:1973
> 1973cpuid = PCPU_GET(cpuid);
> (kgdb) bt
> #0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
> flags= optimized out>)
> at /usr/src/sys/kern/sched_ule.c:1973
> #1  0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at
> /usr/src/sys/kern/kern_tc.c:2121
> #2  0x80555227 in timekeep_push_vdso () at
> /usr/src/sys/kern/kern_sharedpage.c:174
> #3  0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426
> #4  0x804eaa41 in hardclock_cnt (cnt=1, usermode= out>)
> at /usr/src/sys/kern/kern_clock.c:589
> #5  0x808fac74 in handleevents (now=, fake=0) at
> /usr/src/sys/kern/kern_clocksource.c:223
> #6  0x808fb1d7 in timercb (et=0x8100cf20, arg= out>) at /usr/src/sys/kern/kern_clocksource.c:352
> #7  0xf800b6429a00 in ?? ()
> #8  0x81051080 in vm_page_array ()
> #9  0x81051098 in vm_page_queue_free_mtx ()
> #10 0xfe0101818920 in ?? ()
> #11 0x805399c0 in __mtx_lock_sleep (c=, tid=Error
> accessing memory address 0xffac: Bad add\
> ress.
> ) at /usr/src/sys/kern/kern_mutex.c:590
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) q
> [root@avoriaz ~]#
> 
> I don't find the requested frame
> 
> Henri


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 15:00, Andriy Gapon wrote:

On 10/11/2016 12:30, Henri Hennebert wrote:

On 11/10/2016 11:21, Andriy Gapon wrote:

On 09/11/2016 15:58, Eric van Gyzen wrote:

On 11/09/2016 07:48, Henri Hennebert wrote:

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260:
Fri Nov  4 02:51:33 CET 2016
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory
/usr/local/news/bin

[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
-o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
  462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'

After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?


I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.


In fact, it's procstat -kk -a.  With just one thread we would see that a thread
is blocked on something, but we won't see why that something can not be 
acquired.



I attach the result,


Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:

[root@avoriaz ~]# kgdb
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.

done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.

done.

--- clip ---

Loaded symbols for /boot/kernel/accf_data.ko
Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done.

done.
Loaded symbols for /boot/kernel/daemon_saver.ko
#0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
flags=)

at /usr/src/sys/kern/sched_ule.c:1973
1973cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
flags=)

at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at 
/usr/src/sys/kern/kern_tc.c:2121
#2  0x80555227 in timekeep_push_vdso () at 
/usr/src/sys/kern/kern_sharedpage.c:174

#3  0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426
#4  0x804eaa41 in hardclock_cnt (cnt=1, usermode=optimized out>) at /usr/src/sys/kern/kern_clock.c:589
#5  0x808fac74 in handleevents (now=, 
fake=0) at /usr/src/sys/kern/kern_clocksource.c:223
#6  0x808fb1d7 in timercb (et=0x8100cf20, arg=optimized out>) at /usr/src/sys/kern/kern_clocksource.c:352

#7  0xf800b6429a00 in ?? ()
#8  0x81051080 in vm_page_array ()
#9  0x81051098 in vm_page_queue_free_mtx ()
#10 0xfe0101818920 in ?? ()
#11 0x805399c0 in __mtx_lock_sleep (c=, 
tid=Error accessing memory address 0xffac: Bad add\

ress.
) at /usr/src/sys/kern/kern_mutex.c:590
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) q
[root@avoriaz ~]#

I don't find the requested frame

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 12:30, Henri Hennebert wrote:
> On 11/10/2016 11:21, Andriy Gapon wrote:
>> On 09/11/2016 15:58, Eric van Gyzen wrote:
>>> On 11/09/2016 07:48, Henri Hennebert wrote:
 I encounter a strange deadlock on

 FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
 r308260:
 Fri Nov  4 02:51:33 CET 2016
 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

 This system is exclusively running on zfs.

 After 3 or 4 days, `periodic daily` is locked in the directory
 /usr/local/news/bin

 [root@avoriaz ~]# ps xa|grep find
 85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) 
 -prune
 -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
   462  1  S+   0:00.00 grep find
 [root@avoriaz ~]# procstat -f 85656
   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
 85656 find  text v r r---   -   - - /usr/bin/find
 85656 find   cwd v d r---   -   - - /usr/local/news/bin
 85656 find  root v d r---   -   - - /
 85656 find 0 v c r---   3   0 - /dev/null
 85656 find 1 p - rw--   1   0 - -
 85656 find 2 v r -w--   7  17 - -
 85656 find 3 v d r---   1   0 - /home/root
 85656 find 4 v d r---   1   0 - /home/root
 85656 find 5 v d rn--   1 533545184 - 
 /usr/local/news/bin
 [root@avoriaz ~]#

 If I try `ls /usr/local/news/bin` it is also locked.

 After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 
 0'

 After a reset and reboot  I can access /usr/local/news/bin.

 I delete this directory and reinstall the package `portupgrade -fu 
 news/inn`

 5 days later `periodic daily`is locked on the same directory :-o

 Any idea?
>>>
>>> I can't help with the deadlock, but someone who _can_ help will probably 
>>> ask for
>>> the output of "procstat -kk PID" with the PID of the "find" process.
>>
>> In fact, it's procstat -kk -a.  With just one thread we would see that a 
>> thread
>> is blocked on something, but we won't see why that something can not be 
>> acquired.
>>
>>
> I attach the result,

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.



-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 11:21, Andriy Gapon wrote:

On 09/11/2016 15:58, Eric van Gyzen wrote:

On 11/09/2016 07:48, Henri Hennebert wrote:

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260:
Fri Nov  4 02:51:33 CET 2016
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory 
/usr/local/news/bin

[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
-o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
  462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'

After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?


I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.


In fact, it's procstat -kk -a.  With just one thread we would see that a thread
is blocked on something, but we won't see why that something can not be 
acquired.



I attach the result,

Henri
[root@avoriaz ~]# procstat -kk -a
  PIDTID COMM TDNAME   KSTACK   
0 10 kernel   swapper  mi_switch+0xd2 
sleepq_timedwait+0x3a _sleep+0x281 swapper+0x464 btext+0x2c 
0 19 kernel   kqueue_ctx taskq mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100012 kernel   aiod_kick taskq  mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100013 kernel   thread taskq mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100018 kernel   firmware taskq   mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100022 kernel   acpi_task_0  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100023 kernel   acpi_task_1  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100024 kernel   acpi_task_2  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100025 kernel   em0 que  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100026 kernel   em0 txq  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100027 kernel   em1 taskqmi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100060 kernel   mca taskqmi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100061 kernel   system_taskq_0   mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100062 kernel   system_taskq_1   mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100063 kernel   dbu_evictmi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100072 kernel   CAM taskqmi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100086 kernel   if_config_tqg_0  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd gtaskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100087 kernel   if_io_tqg_0  mi_switch+0xd2 sleepq_wait+0x3a 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 09/11/2016 15:58, Eric van Gyzen wrote:
> On 11/09/2016 07:48, Henri Hennebert wrote:
>> I encounter a strange deadlock on
>>
>> FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
>> r308260:
>> Fri Nov  4 02:51:33 CET 2016
>> r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64
>>
>> This system is exclusively running on zfs.
>>
>> After 3 or 4 days, `periodic daily` is locked in the directory 
>> /usr/local/news/bin
>>
>> [root@avoriaz ~]# ps xa|grep find
>> 85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) 
>> -prune
>> -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
>>   462  1  S+   0:00.00 grep find
>> [root@avoriaz ~]# procstat -f 85656
>>   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
>> 85656 find  text v r r---   -   - - /usr/bin/find
>> 85656 find   cwd v d r---   -   - - /usr/local/news/bin
>> 85656 find  root v d r---   -   - - /
>> 85656 find 0 v c r---   3   0 - /dev/null
>> 85656 find 1 p - rw--   1   0 - -
>> 85656 find 2 v r -w--   7  17 - -
>> 85656 find 3 v d r---   1   0 - /home/root
>> 85656 find 4 v d r---   1   0 - /home/root
>> 85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
>> [root@avoriaz ~]#
>>
>> If I try `ls /usr/local/news/bin` it is also locked.
>>
>> After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'
>>
>> After a reset and reboot  I can access /usr/local/news/bin.
>>
>> I delete this directory and reinstall the package `portupgrade -fu news/inn`
>>
>> 5 days later `periodic daily`is locked on the same directory :-o
>>
>> Any idea?
> 
> I can't help with the deadlock, but someone who _can_ help will probably ask 
> for
> the output of "procstat -kk PID" with the PID of the "find" process.

In fact, it's procstat -kk -a.  With just one thread we would see that a thread
is blocked on something, but we won't see why that something can not be 
acquired.


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/09/2016 19:23, Thierry Thomas wrote:

Le mer.  9 nov. 16 à 15:03:49 +0100, Henri Hennebert 
 écrivait :


[root@avoriaz ~]# procstat -kk 85656
   PIDTID COMM TDNAME KSTACK
85656 101112 find -mi_switch+0xd2
sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c
VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679
vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572
kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb


It looks similar to the problem reportes in PR 205163
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163

May be causes by too small values for some vfs.zfs.arc*.
Could you please list sysctl for vfs.zfs.arc_max and others?

Regards,


[root@avoriaz ~]# sysctl vfs.zfs
vfs.zfs.trim.max_interval: 1
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.enabled: 1
vfs.zfs.vol.unmap_enabled: 1
vfs.zfs.vol.recursive: 0
vfs.zfs.vol.mode: 1
vfs.zfs.version.zpl: 5
vfs.zfs.version.spa: 5000
vfs.zfs.version.acl: 1
vfs.zfs.version.ioctl: 6
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
vfs.zfs.sync_pass_rewrite: 2
vfs.zfs.sync_pass_dont_compress: 5
vfs.zfs.sync_pass_deferred_free: 2
vfs.zfs.zio.exclude_metadata: 0
vfs.zfs.zio.use_uma: 1
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_replay_disable: 0
vfs.zfs.min_auto_ashift: 9
vfs.zfs.max_auto_ashift: 13
vfs.zfs.vdev.trim_max_pending: 1
vfs.zfs.vdev.bio_delete_disable: 0
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.scrub_max_active: 2
vfs.zfs.vdev.scrub_min_active: 1
vfs.zfs.vdev.async_write_max_active: 10
vfs.zfs.vdev.async_write_min_active: 1
vfs.zfs.vdev.async_read_max_active: 3
vfs.zfs.vdev.async_read_min_active: 1
vfs.zfs.vdev.sync_write_max_active: 10
vfs.zfs.vdev.sync_write_min_active: 10
vfs.zfs.vdev.sync_read_max_active: 10
vfs.zfs.vdev.sync_read_min_active: 10
vfs.zfs.vdev.max_active: 1000
vfs.zfs.vdev.async_write_active_max_dirty_percent: 60
vfs.zfs.vdev.async_write_active_min_dirty_percent: 30
vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1
vfs.zfs.vdev.mirror.non_rotating_inc: 0
vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576
vfs.zfs.vdev.mirror.rotating_seek_inc: 5
vfs.zfs.vdev.mirror.rotating_inc: 0
vfs.zfs.vdev.trim_on_init: 1
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 0
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.metaslabs_per_vdev: 200
vfs.zfs.txg.timeout: 5
vfs.zfs.space_map_blksz: 4096
vfs.zfs.spa_slop_shift: 5
vfs.zfs.spa_asize_inflation: 24
vfs.zfs.deadman_enabled: 1
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_synctime_ms: 100
vfs.zfs.debug_flags: 0
vfs.zfs.recover: 0
vfs.zfs.spa_load_verify_data: 1
vfs.zfs.spa_load_verify_metadata: 1
vfs.zfs.spa_load_verify_maxinflight: 1
vfs.zfs.ccw_retry_interval: 300
vfs.zfs.check_hostid: 1
vfs.zfs.mg_fragmentation_threshold: 85
vfs.zfs.mg_noalloc_threshold: 0
vfs.zfs.condense_pct: 200
vfs.zfs.metaslab.bias_enabled: 1
vfs.zfs.metaslab.lba_weighting_enabled: 1
vfs.zfs.metaslab.fragmentation_factor_enabled: 1
vfs.zfs.metaslab.preload_enabled: 1
vfs.zfs.metaslab.preload_limit: 3
vfs.zfs.metaslab.unload_delay: 8
vfs.zfs.metaslab.load_pct: 50
vfs.zfs.metaslab.min_alloc_size: 33554432
vfs.zfs.metaslab.df_free_pct: 4
vfs.zfs.metaslab.df_alloc_threshold: 131072
vfs.zfs.metaslab.debug_unload: 0
vfs.zfs.metaslab.debug_load: 0
vfs.zfs.metaslab.fragmentation_threshold: 70
vfs.zfs.metaslab.gang_bang: 16777217
vfs.zfs.free_bpobj_enabled: 1
vfs.zfs.free_max_blocks: 18446744073709551615
vfs.zfs.no_scrub_prefetch: 0
vfs.zfs.no_scrub_io: 0
vfs.zfs.resilver_min_time_ms: 3000
vfs.zfs.free_min_time_ms: 1000
vfs.zfs.scan_min_time_ms: 1000
vfs.zfs.scan_idle: 50
vfs.zfs.scrub_delay: 4
vfs.zfs.resilver_delay: 2
vfs.zfs.top_maxinflight: 32
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.max_distance: 8388608
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.delay_scale: 50
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 373664153
vfs.zfs.max_recordsize: 1048576
vfs.zfs.mdcomp_disable: 0
vfs.zfs.nopwrite_enabled: 1
vfs.zfs.dedup.prefetch: 1
vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 24202240
vfs.zfs.mfu_ghost_metadata_lsize: 136404992
vfs.zfs.mfu_ghost_size: 160607232
vfs.zfs.mfu_data_lsize: 449569280
vfs.zfs.mfu_metadata_lsize: 102724608
vfs.zfs.mfu_size: 714202624
vfs.zfs.mru_ghost_data_lsize: 874834432
vfs.zfs.mru_ghost_metadata_lsize: 387692032
vfs.zfs.mru_ghost_size: 1262526464
vfs.zfs.mru_data_lsize: 151275008
vfs.zfs.mru_metadata_lsize: 13547008
vfs.zfs.mru_size: 322614272
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 2916352
vfs.zfs.l2arc_norw: 1

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Thierry Thomas
Le mer.  9 nov. 16 à 15:03:49 +0100, Henri Hennebert 
 écrivait :

> [root@avoriaz ~]# procstat -kk 85656
>PIDTID COMM TDNAME KSTACK
> 85656 101112 find -mi_switch+0xd2 
> sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c 
> VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 
> vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 
> kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb

It looks similar to the problem reportes in PR 205163
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163

May be causes by too small values for some vfs.zfs.arc*.
Could you please list sysctl for vfs.zfs.arc_max and others?

Regards,
-- 
Th. Thomas.


signature.asc
Description: PGP signature


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Henri Hennebert

On 11/09/2016 14:58, Eric van Gyzen wrote:

On 11/09/2016 07:48, Henri Hennebert wrote:

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260:
Fri Nov  4 02:51:33 CET 2016
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory 
/usr/local/news/bin

[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
-o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
   462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'

After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?

I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.

Eric

[root@avoriaz ~]# procstat -kk 85656
  PIDTID COMM TDNAME KSTACK
85656 101112 find -mi_switch+0xd2 
sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c 
VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 
vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 
kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb




Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Eric van Gyzen
On 11/09/2016 07:48, Henri Hennebert wrote:
> I encounter a strange deadlock on
> 
> FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
> r308260:
> Fri Nov  4 02:51:33 CET 2016
> r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64
> 
> This system is exclusively running on zfs.
> 
> After 3 or 4 days, `periodic daily` is locked in the directory 
> /usr/local/news/bin
> 
> [root@avoriaz ~]# ps xa|grep find
> 85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
> -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
>   462  1  S+   0:00.00 grep find
> [root@avoriaz ~]# procstat -f 85656
>   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
> 85656 find  text v r r---   -   - - /usr/bin/find
> 85656 find   cwd v d r---   -   - - /usr/local/news/bin
> 85656 find  root v d r---   -   - - /
> 85656 find 0 v c r---   3   0 - /dev/null
> 85656 find 1 p - rw--   1   0 - -
> 85656 find 2 v r -w--   7  17 - -
> 85656 find 3 v d r---   1   0 - /home/root
> 85656 find 4 v d r---   1   0 - /home/root
> 85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
> [root@avoriaz ~]#
> 
> If I try `ls /usr/local/news/bin` it is also locked.
> 
> After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'
> 
> After a reset and reboot  I can access /usr/local/news/bin.
> 
> I delete this directory and reinstall the package `portupgrade -fu news/inn`
> 
> 5 days later `periodic daily`is locked on the same directory :-o
> 
> Any idea?

I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.

Eric
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Henri Hennebert

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
r308260: Fri Nov  4 02:51:33 CET 2016 
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64


This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory 
/usr/local/news/bin


[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) 
-prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam

  462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - 
/usr/local/news/bin

[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 
0 0'


After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"