On Tue, Jul 05, 2016 at 02:14:57PM +0200, Martin Pieuchot wrote:
> Without more information it's hard to find what could be the reason
> for this crash.  Being able to reproduce the crash easily is the key
> to debugging.  Can you do that?

while :; do (chrome &); sleep 8; pkill chrome; done

It takes up to a minute to trigger it.  I should mention however since
I just noticed (sorry about this) that I am running with your scheduler
hack from some months ago.

diff --git a/sys/kern/sched_bsd.c b/sys/kern/sched_bsd.c
index 8b318df..7a0a1f7 100644
--- a/sys/kern/sched_bsd.c
+++ b/sys/kern/sched_bsd.c
@@ -298,7 +298,16 @@ yield(void)
        int s;
 
        SCHED_LOCK(s);
-       p->p_priority = p->p_usrpri;
+       /*
+        * If one of the threads of a multi-threaded process called
+        * sched_yield(2), drop its priority to ensure its siblings
+        * can make some progress.
+        */
+       if (TAILQ_FIRST(&p->p_p->ps_threads) == p &&
+           TAILQ_NEXT(p, p_thr_link) == NULL)
+               p->p_priority = p->p_usrpri;
+       else
+               p->p_priority = min(MAXPRI, p->p_usrpri * 2);
        p->p_stat = SRUN;
        setrunqueue(p);
        p->p_ru.ru_nvcsw++;
-- 
2.8.1

Without this diff, I cannot trigger it in any reasonable time.

> Well the panic you reported, I don't know if you encountered any other,
> was triggered by a NULL dereference.  That means that the defer heap was
> containing at least a NULL pointer.

Yes you are right, I misinterpreted the output.

> My bet is that unp_discard() is called twice for a set of fps, because as
> you can see the set of fps is cleared after being enqueued.
> 
> If you can reproduce the crash, could you run with the diff below and
> see if you can trigger the panic?

I manually copied the panic output:

panic: tell me what you really really want
Stopped at Debugger+0x9: leave
TID     PID     UID     PRFLAGS         PFLAGS  CPU     COMMAND
24513   24513   1000    0x200003        0       1       chrome
32334   32334   1000    0x300003        0       3       chrome
89099   89099   0       0x14000         0x200   0       reaper
91496   91496   0       0x14000         0x200   2       softnet
Debugger() at Debugger+0x9
panic() at panic+0xfe
unp_discard() at unp_discard+0xd7
unp_scan() at unp_scan+0x62
sorflush() at sorflush()+0x17f
sofree() at sofree+0xbc
soclose() at soclose+0x92
soo_close() at soo_close+0x1c
fdrop() at fdrop+0x2c
closef() at closef+0xcb
sys_close() at sys_close+0x4f
syscall() at syscall+0x27b
--- syscall (number 6) ---
end of kernel
end trace frame: 0x7f7ffffcca40, count: 3
0xa94666aa29a:

mach ddbcpu 0

Stopped at Debugger+0x9: leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lagic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x48
uvm_pause() at uvm_pause+0x54
uvm_unmap_detach() at uvm_unmap_detach()+0xc5
uvm_map_teardown() at uvm_map_teardown+0x16d
uvmspace_free() at uvmspace_free+0x36
uvm_exit() at uvm_exit+0x15
reaper() at reaper+0xdb
end trace frame: 0x0, count: 5

mach ddbcpu 1

Stopped at Debugger+0x9: leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lagic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
__mp_lock() at __mp_lock+0x48
trap() at trap+0x1df
--- trap (number 6) ---
end of kernel
end trace frame: 0x73d06, count: 10
0xda44e004d6c

mach ddbcpu 2

Stopped at Debugger+0x9: leave
Debugger() at Debugger+0x9
x86_ipi_handler() at x86_ipi_handler+0x76
Xresume_lagic_ipi() at Xresume_lapic_ipi+0x1c
--- interrupt ---
mtx_enter() at mtx_enter+0x25
taskq_sleep() at taskq_sleep+0x2d
taskq_next_work() at taskq_next_work+0x47
taskq_thread() at taskq_thread+0x61
end trace frame: 0x0, count: 8

Reply via email to