::@Nilay: Would be good to get your input::
Hey guys,
I've spent some time digging into this dma_timer_expiry bug, and I'm
close to having it sorted out. It appears as though when propagating
interrupts through the O3CPU pipeline to commit, an interrupt may be
dropped if there is more than one interrupt raised at a CPU at one time.
Here is the code for the propagateInterrupt function in
src/cpu/o3/commit_impl.hh that might be the problem (emphasis added with
bold):
------------------------------------------------------
template <class Impl>
void
DefaultCommit<Impl>::propagateInterrupt()
{
if (commitStatus[0] == TrapPending || *interrupt* || trapSquash[0] ||
tcSquash[0])
*return;*
// Process interrupts if interrupts are enabled, not in PAL
// mode, and no other traps or external squashes are currently
// pending.
// @todo: Allow other threads to handle interrupts.
// Get any interrupt that happened
*interrupt = cpu->getInterrupts();*
// Tell fetch that there is an interrupt pending. This
// will make fetch wait until it sees a non PAL-mode PC,
// at which point it stops fetching instructions.
if (interrupt != NoFault)
toIEW->commitInfo[0].interruptPending = true;
}
------------------------------------------------------
If a second interrupt has arrived, the interrupt variable is already set,
so propagateInterrupts fails to call cpu->getInterrupts() for the second
interrupt, effectively dropping it. There's also an instance of priority
inversion in here in the case where the second interrupt may have higher
priority than the first, but the first will be serviced regardless.
Currently, I see this manifesting itself as dropped interrupts. The
common case that I'm seeing is when a running process is printing output to
the console (interrupt 52), while the disk is active making DMA transfers
(interrupt 62). This can cause either the console output to stop printing,
or in the problematic case, the IDE disk transfers fail to complete until
Linux later times out and retries the request.
I'll be exploring ways to fix this shortly, but I'm definitely out of my
depth when it comes to the O3CPU. I'd appreciate any advice for how to fix
this.
Thanks!
Joel
On Thu, Sep 20, 2012 at 12:29 PM, Joel Hestness <[email protected]>wrote:
> Hey guys,
> I'm having trouble with the MOESI_hammer protocol losing interrupts for
> the IDE device in x86 full-system. The following message is printed to the
> system.pc.com_1.terminal:
>
> "
> hda: dma_timer_expiry: DMA status (0x64)
> hda: DMA interrupt recovery
> hda: lost interrupt
> "
>
> Many of my simulations are still completing correctly. However, I'm
> concerned because this lost interrupt appears to insert a bubble of a
> couple simulated seconds into the runtime of my benchmarks, presumably
> while the operating system tries to reclaim the lost interrupt.
>
> Anyone have pointers for how I might go about tracking this down?
> Thanks,
> Joel
>
>
> --
> Joel Hestness
> PhD Student, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://www.cs.utexas.edu/~hestness
>
--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://www.cs.utexas.edu/~hestness
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev