Re: [gem5-dev] o3 cpu: possible bug in interrupt handling

Joel Hestness Thu, 21 Mar 2013 09:34:00 -0700

Hey guys,
  Ali's assessment is correct, namely, that higher priority interrupts can
be dropped if the x86 interrupts device receives a low priority interrupt
followed by a high priority interrupt before either is serviced.  From what
I recall, this problem is unique to x86, because it is the only
architecture gem5 supports that allows multiple concurrent interrupts.
 This means that you can consider servicing interrupts out of order - in
this case, servicing the higher priority interrupt first - unlike other
architectures, which will service the interrupts in FIFO order.


@Nilay:
  Your solution is definitely a step in the right direction, but I think we
should go a step farther, because this will still instantiate and drop
fault object instances with the interrupt pointer.  Yes, faults are
ref-counted, but we don't even need to instantiate them: Your solution
reduces the commit stage interrupt pointer to a bool which indicates
whether there is an interrupt pending.  Why not replace the pointer with a
bool, say "interruptPending", which is used to signal whether an interrupt
is pending instead of the interrupt pointer being non-NULL for all gating
in the commit pipeline?  Instead of calling getInterrupts() as we do
currently and having commit hang onto a fault instance with the interrupt
pointer, the O3CPU should call a different (new) interrupts device
interface function isInterruptPending(), which returns a bool if there is
an interrupt to service.  This would greatly clarify the use of these
variables in the commit stage and it would reduce the risk of future code
modifications that mishandle the interrupt pointer.  The obvious downside
of this solution is that it will require updating the interrupts devices
for every architecture, which may be a fair amount of work, and testing
these changes is very non-trivial.

  Thoughts?
  Joel


On Thu, Mar 21, 2013 at 9:49 AM, Ali Saidi <sa...@umich.edu> wrote:

>
>
> Hi Nilay,
>
> I'm a little surprised at this issue. I could see that
> another higher priority interrupt comes along, but if that is the case
> it should be taken after the lower priority interrupt (nearly
> immediately). It could be considered a bug, but on the other hand the
> CPU has to decide what interrupt it is going to take at some point and
> act on it, so perhaps not.
>
> I'm guessing something like the following
> happens:
>
> assert(low_priority);
>
> interrupt = getInterrupt(); //
> low_priority
>
> assert(high_priority);
>
> handle_interrupt(); // just
> clears the highest priority interrupt set??
>
> I suppose I don't have any
> issues with your fix, provided there is a nice comment describing
> what/why. I'd like to see if we are in fact doing the above.
>
> Thanks,
>
>
> Ali
>
> On 20.03.2013 23:54, Nilay wrote:
>
> > Hi
> >
> > While testing some
> patches for the x86 architecture, I came across a
> > problem in which the
> system does nothing for several seconds. This time is
> > the time of the
> target machine. This behavior is accompanied by the
> > following message
> on the console for target machine --
> >
> > hda: dma_timer_expiry: dma
> status == 0x64
> > hda: DMA interrupt recovery
> > hda: lost interrupt
> >
> >
> Joel Hestness, who had seen the problem before, provided me with a
> patch
> > that solves the problem. From my conversation with Joel and
> after looking
> > at the code my self, it appears that the problem is with
> the fact that the
> > commit stage of the pipeline keeps a local copy of
> the interrupt object.
> > Since the interrupt is usually handled several
> cycles after the commit
> > stage becomes aware of it, it is possible that
> the local copy of the
> > interrupt object may not be the correct
> interrupt when the interrupt is
> > actually handled. It is possible that
> another interrupt occurred in the
> > interval between interrupt detection
> and interrupt handling. I am
> > proposing the following solution
> (slightly different from Joel's proposal)
> > to handle this problem:
> >
> >
> diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
> > ---
> a/src/cpu/o3/commit_impl.hh
> > +++ b/src/cpu/o3/commit_impl.hh
> > @@
> -753,7 +753,7 @@
> > }
> >
> > // CPU will handle interrupt.
> > -
> cpu->processInterrupts(interrupt);
> > +
> cpu->processInterrupts(cpu->getInterrupts());
> >
> >
> thread[0]->noSquashFromTC = false;
> >
> > The code above ignores the local
> copy of the interrupt object and fetches
> > a new one from the CPU
> object.
> >
> > There are several questions that need to be addressed here.
> Is there an
> > actual bug in the o3 cpu? Is the diagnosis correct? Is the
> solution
> > acceptable?
> >
> > Thanks
> > Nilay
> >
> >
> _______________________________________________
> > gem5-dev mailing
> list
> > gem5-dev@gem5.org
> > http://m5sim.org/mailman/listinfo/gem5-dev
> [1]
>
>
>
> Links:
> ------
> [1] http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>



-- 
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://www.cs.utexas.edu/~hestness
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] o3 cpu: possible bug in interrupt handling

Reply via email to