On Mon, Jun 08, 2009 at 08:25:50PM +0300, Mikolaj Golub wrote: > After recent upgrade (on June 3) of my 7-stable host (WITNESS is enabled, the > previous build was Apr 26), I have been experienced panics when starting some > our home made application: > > Architecture: i386 > Architecture Version: 2 > Dump Length: 73797632B (70 MB) > Blocksize: 512 > Dumptime: Mon Jun 8 15:29:14 2009 > Hostname: zhuzha.ua1 > Magic: FreeBSD Kernel Dump > Version String: FreeBSD 7.2-STABLE #18: Wed Jun 3 14:28:49 EEST 2009 > [email protected]:/usr/obj/usr/src/sys/DEBUG > Panic String: Assertion lock == sq->sq_lock failed at > /usr/src/sys/kern/subr_sleepqueue.c:327 > Dump Parity: 1277108796 > Bounds: 18 > > Narrowing down the problem I have written a simple test program (in attaches) > that models in simplified way the behaviour or our application and reproduces > the crash. There are two threads. One of the threads is doing vfork/exec to > start child process while the other is monitoring the status of the child > calling wait4(). > > At the moment of the crash the parent vfork thread is sleeping on > wchan=0xc4d17ae0 (td->td_proc) with lock=0xc4d178b8 waiting for the child to > exec. The parent wait4 thread with lock=0xc4d17b70 and the same wchan calls > sleepq_lookup(wchan) to check if the wait channel already have a sleep queue > associated with it, finds the queue of the vfork thread, tries to insert the > current thread into this sleep queue and fails on assertion lock==sq->sq_lock. > > wait4 treead: > > (kgdb) thr 128 > [Switching to thread 128 (Thread 100164)]#0 doadump () at pcpu.h:196 > 196 in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:196 > #1 0xc0496509 in db_fncall (dummy1=-1061924641, dummy2=0, dummy3=-1, > dummy4=0xe69a089c "") > at /usr/src/sys/ddb/db_command.c:516 > #2 0xc0496abf in db_command (last_cmdp=0xc0c9ab54, cmd_table=0x0, dopager=0) > at /usr/src/sys/ddb/db_command.c:413 > #3 0xc0496b34 in db_command_script (command=0xc0c9baa4 "call doadump") > at /usr/src/sys/ddb/db_command.c:484 > #4 0xc049a3a0 in db_script_exec (scriptname=0xe69a09a8 "kdb.enter.panic", > warnifnotfound=Variable "warnifnotfound" is not available. > ) > at /usr/src/sys/ddb/db_script.c:302 > #5 0xc049a47d in db_script_kdbenter (eventname=0xc0b8fa85 "panic") at > /usr/src/sys/ddb/db_script.c:324 > #6 0xc0498388 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:227 > #7 0xc0810d36 in kdb_trap (type=3, code=0, tf=0xe69a0ae0) at > /usr/src/sys/kern/subr_kdb.c:524 > #8 0xc0add6ab in trap (frame=0xe69a0ae0) at /usr/src/sys/i386/i386/trap.c:688 > #9 0xc0ac159b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 > #10 0xc0810eba in kdb_enter_why (why=0xc0b8fa85 "panic", msg=0xc0b8fa85 > "panic") at cpufunc.h:60 > #11 0xc07e4216 in panic (fmt=0xc0b5ecd3 "Assertion %s failed at %s:%d") > at /usr/src/sys/kern/kern_shutdown.c:557 > #12 0xc08194dc in sleepq_add (wchan=0xc4d17ae0, lock=0xc4d17b70, > wmesg=0xc0b959c1 "wait", flags=256, > queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:327 > #13 0xc07ec94b in _sleep (ident=0xc4d17ae0, lock=Variable "lock" is not > available. > ) at /usr/src/sys/kern/kern_synch.c:203 > #14 0xc07bd8e6 in kern_wait (td=0xc41e8480, pid=-1, status=0xe69a0c2c, > options=Variable "options" is not available. > ) > at /usr/src/sys/kern/kern_exit.c:902 > #15 0xc07bd95b in wait4 (td=0xc41e8480, uap=0xe69a0cfc) at > /usr/src/sys/kern/kern_exit.c:688 > #16 0xc0adce23 in syscall (frame=0xe69a0d38) at > /usr/src/sys/i386/i386/trap.c:1090 > #17 0xc0ac1600 in Xint0x80_syscall () at > /usr/src/sys/i386/i386/exception.s:255 > #18 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) fr 12 > #12 0xc08194dc in sleepq_add (wchan=0xc4d17ae0, lock=0xc4d17b70, > wmesg=0xc0b959c1 "wait", flags=256, > queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:327 > 327 MPASS(lock == sq->sq_lock); > (kgdb) p lock > $1 = (struct lock_object *) 0xc4d17b70 > (kgdb) p sq->sq_lock > $2 = (struct lock_object *) 0xc4d178b8 > (kgdb) p *lock > $3 = {lo_name = 0xc0b8e532 "process lock", lo_type = 0xc0b8e532 "process > lock", lo_flags = 21168128, > lo_witness_data = {lod_list = {stqe_next = 0xc0ce4840}, lod_witness = > 0xc0ce4840}} > (kgdb) p *sq->sq_lock > $4 = {lo_name = 0xc0b8e532 "process lock", lo_type = 0xc0b8e532 "process > lock", lo_flags = 21168128, > lo_witness_data = {lod_list = {stqe_next = 0xc0ce4840}, lod_witness = > 0xc0ce4840}} > > vfork thread of the parent process: > > (kgdb) thr 127 > [Switching to thread 127 (Thread 100162)]#0 sched_switch (td=0xc41e86c0, > newtd=Variable "newtd" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1944 > 1944 cpuid = PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=0xc41e86c0, newtd=Variable "newtd" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1944 > #1 0xc07ec4c3 in mi_switch (flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/kern_synch.c:444 > #2 0xc0818bd2 in sleepq_switch (wchan=0xc4d17ae0) at > /usr/src/sys/kern/subr_sleepqueue.c:497 > #3 0xc0819800 in sleepq_wait (wchan=0xc4d17ae0) at > /usr/src/sys/kern/subr_sleepqueue.c:580 > #4 0xc07eca69 in _sleep (ident=0xc4d17ae0, lock=0xc4d178b8, priority=92, > wmesg=0xc0b8bacb "ppwait", > timo=0) at /usr/src/sys/kern/kern_synch.c:230 > #5 0xc07c0631 in fork1 (td=0xc41e86c0, flags=-2147483596, pages=0, > procp=0xe699ac78) > at /usr/src/sys/kern/kern_fork.c:747 > #6 0xc07c07c9 in vfork (td=0xc41e86c0, uap=0xe699acfc) at > /usr/src/sys/kern/kern_fork.c:124 > #7 0xc0adce23 in syscall (frame=0xe699ad38) at > /usr/src/sys/i386/i386/trap.c:1090 > #8 0xc0ac1600 in Xint0x80_syscall () at > /usr/src/sys/i386/i386/exception.s:255 > #9 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > > And here is the child process: > > (kgdb) thr 129 > [Switching to thread 129 (Thread 100163)]#0 sched_switch (td=0xc4d1d900, > newtd=Variable "newtd" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1944 > 1944 cpuid = PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=0xc4d1d900, newtd=Variable "newtd" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1944 > #1 0xc07ec4c3 in mi_switch (flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/kern_synch.c:444 > #2 0xc0818bd2 in sleepq_switch (wchan=0xc0ccdac4) at > /usr/src/sys/kern/subr_sleepqueue.c:497 > #3 0xc0818e6f in sleepq_catch_signals (wchan=0xc0ccdac4) at > /usr/src/sys/kern/subr_sleepqueue.c:417 > #4 0xc0819647 in sleepq_timedwait_sig (wchan=0xc0ccdac4) at > /usr/src/sys/kern/subr_sleepqueue.c:631 > #5 0xc07eca31 in _sleep (ident=0xc0ccdac4, lock=0x0, priority=348, > wmesg=0xc0b90bcd "nanslp", timo=2) > at /usr/src/sys/kern/kern_synch.c:224 > #6 0xc07f3f51 in kern_nanosleep (td=0xc4d1d900, rqt=0xe699dc64, > rmt=0xe699dc6c) > at /usr/src/sys/kern/kern_time.c:373 > #7 0xc07f594f in nanosleep (td=0xc4d1d900, uap=0xe699dcfc) at > /usr/src/sys/kern/kern_time.c:415 > #8 0xc0adce23 in syscall (frame=0xe699dd38) at > /usr/src/sys/i386/i386/trap.c:1090 > #9 0xc0ac1600 in Xint0x80_syscall () at > /usr/src/sys/i386/i386/exception.s:255 > #10 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) p &td->td_proc.p_mtx > $5 = (struct mtx *) 0xc4d178b8 > > Actually, I am not sure that the problem is observed only on the recent > STABLE. It might have not been triggered by our application before the > upgrade. Currently I don't have the system with some old STABLE to check this > running the test program.
This is fixed in HEAD by r185647. I did not merged it to 7, because I do not believe that somebody runs stable with INVARIANTS on :).
pgpvQ8yzD4svi.pgp
Description: PGP signature
