On Tue, Jul 05, 2016 at 02:14:57PM +0200, Martin Pieuchot wrote:
> Without more information it's hard to find what could be the reason
> for this crash. Being able to reproduce the crash easily is the key
> to debugging. Can you do that?
while :; do (chrome &); sleep 8; pkill chrome; done
It takes up to a minute to trigger it. I should mention however since
I just noticed (sorry about this) that I am running with your scheduler
hack from some months ago.
diff --git a/sys/kern/sched_bsd.c b/sys/kern/sched_bsd.c
index 8b318df..7a0a1f7 100644
--- a/sys/kern/sched_bsd.c
+++ b/sys/kern/sched_bsd.c
@@ -298,7 +298,16 @@ yield(void)
int s;
SCHED_LOCK(s);
- p->p_priority = p->p_usrpri;
+ /*
+ * If one of the threads of a multi-threaded process called
+ * sched_yield(2), drop its priority to ensure its siblings
+ * can make some progress.
+ */
+ if (TAILQ_FIRST(&p->p_p->ps_threads) == p &&
+ TAILQ_NEXT(p, p_thr_link) == NULL)
+ p->p_priority = p->p_usrpri;
+ else
+ p->p_priority = min(MAXPRI, p->p_usrpri * 2);
p->p_stat = SRUN;
setrunqueue(p);
p->p_ru.ru_nvcsw++;
--
2.8.1
Without this diff, I cannot trigger it in any reasonable time.
> Well the panic you reported, I don't know if you encountered any other,
> was triggered by a NULL dereference. That means that the defer heap was
> containing at least a NULL pointer.
Yes you are right, I misinterpreted the output.
> My bet is that unp_discard() is called twice for a set of fps, because as
> you can see the set of fps is cleared after being enqueued.
>
> If you can reproduce the crash, could you run with the diff below and
> see if you can trigger the panic?
I manually copied the panic output:
panic: tell me what you really really want
Stopped at Debugger+0x9: leave
TID PID UID PRFLAGS PFLAGS CPU COMMAND
24513 24513 1000 0x200003 0 1 chrome
32334 32334 1000 0x300003 0 3 chrome
89099 89099 0 0x14000 0x200 0 reaper
91496 91496 0 0x14000 0x200 2 softnet
Debugger() at Debugger+0x9
panic() at panic+0xfe
unp_discard() at unp_discard+0xd7
unp_scan() at unp_scan+0x62
sorflush() at sorflush()+0x17f
sofree() at sofree+0xbc
soclose() at soclose+0x92
soo_close() at soo_close+0x1c
fdrop() at fdrop+0x2c
closef() at closef+0xcb
sys_close() at sys_close+0x4f
syscall() at syscall+0x27b
--- syscall (number 6) ---
end of kernel
end trace frame: 0x7f7ffffcca40, count: 3
0xa94666aa29a:
mach ddbcpu 0
Stopped at Debugger+0x9: leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lagic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x48
uvm_pause() at uvm_pause+0x54
uvm_unmap_detach() at uvm_unmap_detach()+0xc5
uvm_map_teardown() at uvm_map_teardown+0x16d
uvmspace_free() at uvmspace_free+0x36
uvm_exit() at uvm_exit+0x15
reaper() at reaper+0xdb
end trace frame: 0x0, count: 5
mach ddbcpu 1
Stopped at Debugger+0x9: leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lagic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x48
trap() at trap+0x1df
--- trap (number 6) ---
end of kernel
end trace frame: 0x73d06, count: 10
0xda44e004d6c
mach ddbcpu 2
Stopped at Debugger+0x9: leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lagic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
mtx_enter() at mtx_enter+0x25
taskq_sleep() at taskq_sleep+0x2d
taskq_next_work() at taskq_next_work+0x47
taskq_thread() at taskq_thread+0x61
end trace frame: 0x0, count: 8