Re: Race condition in interrupt handling

Mai, Haohui Fri, 23 Apr 2010 22:45:28 -0700

Hi,

Finally I'm able to hit the bug with all traces enabled. Please see the
attachment for the traces. It seems to me that the return statement
around /home/mai4/work/dOS/src/dOS/l4ka/kernel/src/api/v4/interrupt.cc:265
doesn't clean up properly.


Please let me know if you need any help from me. Thanks very much.

Haohui

On Fri, Apr 23, 2010 at 4:45 PM, Mai, Haohui <haohui....@gmail.com> wrote:

> Here is another run triggering this bug:
>
> Assertion irq_tcb->get_state().is_halted() failed in file
> /home/mai4/work/dOS/src/dOS/l4ka/kernel/src/api/v4/interrupt.cc, line 213
> (fn=ffffffffc0607664)
> --- "KD# assert" ---
> --------------------------------- (eip=ffffffffc0618ba5,
> esp=fffffffe80040f10) ---
> > showtcb
> tcb/tid/name [current]: IRQ_11
>
> === TCB: fffffffe8000b000 === ID: 0000000b00000001 =
> ffffffffffffffc0/ffffffffc0cd2400 === PRIO: 0xff === CPU: 0 ===
> UIP: fffffffe8000b000   queues: rSwl              wait : NIL_THRD
>  :NIL_THRD           space: 0000000000000000
> USP: 0000000000000000   tstate: POLLING           ready: NIL_THRD
>  :NIL_THRD           pdir : 0000000000cb3000
> KSP: fffffffe8000bf98   sndhd : NIL_THRD          send :
> IRQ_000000000011:IRQ_000000000011   pager: NIL_THRD
> total quant:                   0us, ts length  :                  10000us,
> curr ts:            10000us
> abs timeout:                   0us, rel timeout:                      0us
> sens prio: 255, delay: max=0us, curr=0us
> resources: 0000000000000000 []   flags: 0000000000000000 [t]
> partner: 0000004000000001, saved partner: NIL_THRD, saved state: ABORTED ,
> scheduler: 0000004000000001
> > showtcb
> tcb/tid/name [current]: 0000004000000001
> === TCB: fffffffe80040000 === ID: 0000004000000001 =
> 0000000080000200/ffffffffc0cb5000 === PRIO: 0x64 === CPU: 0 ===
> UIP: 00000000020001df   queues: Rswl              wait :
> 0000004100000001:0000004100000001   space: ffffffffc0cb2000
> USP: 00007fff7ffffcf8   tstate: RUNNING           ready:
> 0000004400000001:0000003e00000001   pdir : 0000000000cb3000
> KSP: fffffffe80040e10   sndhd : IRQ_000000000011  send : NIL_THRD
>  :NIL_THRD           pager: ROOTTASK
> total quant:                   0us, ts length  :                  10000us,
> curr ts:             3087us
> abs timeout:            50661820us, rel timeout:       -        15554645us
> sens prio: 100, delay: max=0us, curr=0us
> resources: 0000000000000000 []   flags: 0000000000000000 [t]
> partner: NIL_THRD, saved partner: NIL_THRD, saved state: ABORTED ,
> scheduler: ROOTTASK
>
> Haohui
>
> On Fri, Apr 23, 2010 at 3:15 PM, Mai, Haohui <haohui....@gmail.com> wrote:
>
>> It's pretty difficult to reproduce this problem since it only happens once
>> a while, Here is some information when the kernel goes wild:
>>
>> Assertion irq_tcb->get_state().is_halted() failed in file
>> /home/mai4/work/dOS/src/dOS/l4ka/kernel/src/api/v4/interrupt.cc, line 213
>> (fn=ffffffffc0607664)
>> --- "KD# assert" ---
>> --------------------------------- (eip=ffffffffc0618ba5,
>> esp=fffffffe8003ef10) ---
>> > showqueue
>>
>> [255]: (SIGMA0:0) (ROOTTASK:0) (IRQ_12:0) (IRQ_11:0)
>> [100]: (0000003b00000001:0) (0000003c00000001:0) (0000003d00000001:0)
>> 0000003e00000001:0 (0000003f00000001:0) 0000004000000001:0
>> 0000004100000001:0 0000004200000001:0 (0000004300000001:0)
>> 0000004400000001:0
>> [000]: (0000001800000001:0)
>> idle : IDLETHRD
>>
>> > showtcb
>> tcb/tid/name [current]: IRQ_11
>>
>> === TCB: fffffffe8000b000 === ID: 0000000b00000001 =
>> ffffffffffffffc0/ffffffffc0cd2400 === PRIO: 0xff === CPU: 0 ===
>> UIP: fffffffe8000b000   queues: rswl              wait : NIL_THRD
>>  :NIL_THRD           space: 0000000000000000
>> USP: 0000000000000000   tstate: WAIT_FE           ready: NIL_THRD
>>  :NIL_THRD           pdir : 0000000000cb3000
>> KSP: fffffffe8000bf98   sndhd : NIL_THRD          send : NIL_THRD
>>  :NIL_THRD           pager: NIL_THRD
>> total quant:                   0us, ts length  :                  10000us,
>> curr ts:            10000us
>> abs timeout:                   0us, rel timeout:                      0us
>> sens prio: 255, delay: max=0us, curr=0us
>> resources: 0000000000000000 []   flags: 0000000000000000 [t]
>> partner: 0000004000000001, saved partner: NIL_THRD, saved state: ABORTED ,
>> scheduler: 0000004000000001
>>
>> > showtcb
>> tcb/tid/name [current]: 4000000001
>> === TCB: fffffffe80040000 === ID: 0000004000000001 =
>> 0000000080000200/ffffffffc0cb5000 === PRIO: 0x64 === CPU: 0 ===
>> UIP: 00000000020001df   queues: Rswl              wait :
>> 0000004100000001:0000004100000001   space: ffffffffc0cb2000
>> USP: 00007fff7ffffd08   tstate: RUNNING           ready:
>> 0000003e00000001:0000004200000001   pdir : 0000000000cb3000
>> KSP: fffffffe80040ee8   sndhd : NIL_THRD          send : NIL_THRD
>>  :NIL_THRD           pager: ROOTTASK
>> total quant:                   0us, ts length  :                  10000us,
>> curr ts:             8476us
>> abs timeout:            50581747us, rel timeout:       -         9969065us
>> sens prio: 100, delay: max=0us, curr=0us
>> resources: 0000000000000000 []   flags: 0000000000000000 [t]
>> partner: IRQ_11, saved partner: NIL_THRD, saved state: ABORTED ,
>> scheduler: ROOTTASK
>> >
>>
>> I'm wondering why the IRQ thread is in WAIT_FE state.. Do you have an
>> idea?
>>
>> Haohui
>>
>> On Fri, Apr 16, 2010 at 2:15 AM, Jan Stoess <sto...@kit.edu> wrote:
>>
>>> > Actually I'm hitting this bug once a while under qemu. It seems to me
>>> that
>>> > handle_interrupt and irq_thread() are executed on different CPUs.
>>> >
>>> > What do you need me to do to clarify the problem?
>>>
>>> Can you dump the tracebuffer output when the assert hits and send the
>>> dump over here?
>>>
>>> --
>>> Jan Stoess
>>> KIT/UKa System Architecture Group
>>> Phone: +49 (721) 608 4056
>>> Fax: +49 (721) 608 7664
>>> http://os.ibds.kit.edu/stoess
>>>
>>>
>>
>

irq-halted-state.log.bz2
Description: BZip2 compressed data

Re: Race condition in interrupt handling

Reply via email to