Hi, Finally I'm able to hit the bug with all traces enabled. Please see the attachment for the traces. It seems to me that the return statement around /home/mai4/work/dOS/src/dOS/l4ka/kernel/src/api/v4/interrupt.cc:265 doesn't clean up properly.
Please let me know if you need any help from me. Thanks very much. Haohui On Fri, Apr 23, 2010 at 4:45 PM, Mai, Haohui <haohui....@gmail.com> wrote: > Here is another run triggering this bug: > > Assertion irq_tcb->get_state().is_halted() failed in file > /home/mai4/work/dOS/src/dOS/l4ka/kernel/src/api/v4/interrupt.cc, line 213 > (fn=ffffffffc0607664) > --- "KD# assert" --- > --------------------------------- (eip=ffffffffc0618ba5, > esp=fffffffe80040f10) --- > > showtcb > tcb/tid/name [current]: IRQ_11 > > === TCB: fffffffe8000b000 === ID: 0000000b00000001 = > ffffffffffffffc0/ffffffffc0cd2400 === PRIO: 0xff === CPU: 0 === > UIP: fffffffe8000b000 queues: rSwl wait : NIL_THRD > :NIL_THRD space: 0000000000000000 > USP: 0000000000000000 tstate: POLLING ready: NIL_THRD > :NIL_THRD pdir : 0000000000cb3000 > KSP: fffffffe8000bf98 sndhd : NIL_THRD send : > IRQ_000000000011:IRQ_000000000011 pager: NIL_THRD > total quant: 0us, ts length : 10000us, > curr ts: 10000us > abs timeout: 0us, rel timeout: 0us > sens prio: 255, delay: max=0us, curr=0us > resources: 0000000000000000 [] flags: 0000000000000000 [t] > partner: 0000004000000001, saved partner: NIL_THRD, saved state: ABORTED , > scheduler: 0000004000000001 > > showtcb > tcb/tid/name [current]: 0000004000000001 > === TCB: fffffffe80040000 === ID: 0000004000000001 = > 0000000080000200/ffffffffc0cb5000 === PRIO: 0x64 === CPU: 0 === > UIP: 00000000020001df queues: Rswl wait : > 0000004100000001:0000004100000001 space: ffffffffc0cb2000 > USP: 00007fff7ffffcf8 tstate: RUNNING ready: > 0000004400000001:0000003e00000001 pdir : 0000000000cb3000 > KSP: fffffffe80040e10 sndhd : IRQ_000000000011 send : NIL_THRD > :NIL_THRD pager: ROOTTASK > total quant: 0us, ts length : 10000us, > curr ts: 3087us > abs timeout: 50661820us, rel timeout: - 15554645us > sens prio: 100, delay: max=0us, curr=0us > resources: 0000000000000000 [] flags: 0000000000000000 [t] > partner: NIL_THRD, saved partner: NIL_THRD, saved state: ABORTED , > scheduler: ROOTTASK > > Haohui > > On Fri, Apr 23, 2010 at 3:15 PM, Mai, Haohui <haohui....@gmail.com> wrote: > >> It's pretty difficult to reproduce this problem since it only happens once >> a while, Here is some information when the kernel goes wild: >> >> Assertion irq_tcb->get_state().is_halted() failed in file >> /home/mai4/work/dOS/src/dOS/l4ka/kernel/src/api/v4/interrupt.cc, line 213 >> (fn=ffffffffc0607664) >> --- "KD# assert" --- >> --------------------------------- (eip=ffffffffc0618ba5, >> esp=fffffffe8003ef10) --- >> > showqueue >> >> [255]: (SIGMA0:0) (ROOTTASK:0) (IRQ_12:0) (IRQ_11:0) >> [100]: (0000003b00000001:0) (0000003c00000001:0) (0000003d00000001:0) >> 0000003e00000001:0 (0000003f00000001:0) 0000004000000001:0 >> 0000004100000001:0 0000004200000001:0 (0000004300000001:0) >> 0000004400000001:0 >> [000]: (0000001800000001:0) >> idle : IDLETHRD >> >> > showtcb >> tcb/tid/name [current]: IRQ_11 >> >> === TCB: fffffffe8000b000 === ID: 0000000b00000001 = >> ffffffffffffffc0/ffffffffc0cd2400 === PRIO: 0xff === CPU: 0 === >> UIP: fffffffe8000b000 queues: rswl wait : NIL_THRD >> :NIL_THRD space: 0000000000000000 >> USP: 0000000000000000 tstate: WAIT_FE ready: NIL_THRD >> :NIL_THRD pdir : 0000000000cb3000 >> KSP: fffffffe8000bf98 sndhd : NIL_THRD send : NIL_THRD >> :NIL_THRD pager: NIL_THRD >> total quant: 0us, ts length : 10000us, >> curr ts: 10000us >> abs timeout: 0us, rel timeout: 0us >> sens prio: 255, delay: max=0us, curr=0us >> resources: 0000000000000000 [] flags: 0000000000000000 [t] >> partner: 0000004000000001, saved partner: NIL_THRD, saved state: ABORTED , >> scheduler: 0000004000000001 >> >> > showtcb >> tcb/tid/name [current]: 4000000001 >> === TCB: fffffffe80040000 === ID: 0000004000000001 = >> 0000000080000200/ffffffffc0cb5000 === PRIO: 0x64 === CPU: 0 === >> UIP: 00000000020001df queues: Rswl wait : >> 0000004100000001:0000004100000001 space: ffffffffc0cb2000 >> USP: 00007fff7ffffd08 tstate: RUNNING ready: >> 0000003e00000001:0000004200000001 pdir : 0000000000cb3000 >> KSP: fffffffe80040ee8 sndhd : NIL_THRD send : NIL_THRD >> :NIL_THRD pager: ROOTTASK >> total quant: 0us, ts length : 10000us, >> curr ts: 8476us >> abs timeout: 50581747us, rel timeout: - 9969065us >> sens prio: 100, delay: max=0us, curr=0us >> resources: 0000000000000000 [] flags: 0000000000000000 [t] >> partner: IRQ_11, saved partner: NIL_THRD, saved state: ABORTED , >> scheduler: ROOTTASK >> > >> >> I'm wondering why the IRQ thread is in WAIT_FE state.. Do you have an >> idea? >> >> Haohui >> >> On Fri, Apr 16, 2010 at 2:15 AM, Jan Stoess <sto...@kit.edu> wrote: >> >>> > Actually I'm hitting this bug once a while under qemu. It seems to me >>> that >>> > handle_interrupt and irq_thread() are executed on different CPUs. >>> > >>> > What do you need me to do to clarify the problem? >>> >>> Can you dump the tracebuffer output when the assert hits and send the >>> dump over here? >>> >>> -- >>> Jan Stoess >>> KIT/UKa System Architecture Group >>> Phone: +49 (721) 608 4056 >>> Fax: +49 (721) 608 7664 >>> http://os.ibds.kit.edu/stoess >>> >>> >> >
irq-halted-state.log.bz2
Description: BZip2 compressed data