So your fix is basically to not allow a interrupt to happen in between a macroop since that leads to imprecise state and the current problem? If so, I think that's reasonable and definitely it's good work on your part! Persistence pays off.
Although, I think there is probably a more descriptive name than "instruction in flight" and also someone may have a better way to encapsulate this behavior. I'd suggest posting this to the reviewboard for further comments on this patch. On Fri, Dec 30, 2011 at 4:16 PM, Nilay Vaish <[email protected]> wrote: > The following patch takes care of the current problem. I am able to > replicate the O3 full system regression test with Ruby in place of the > classic memory system. > > diff --git a/src/cpu/o3/commit.hh b/src/cpu/o3/commit.hh > --- a/src/cpu/o3/commit.hh > +++ b/src/cpu/o3/commit.hh > @@ -443,6 +443,9 @@ > /** Rename map interface. */ > RenameMap *renameMap[Impl::MaxThreads]; > > + /* Variable for keeping track if an instruction is in middle of > execution */ > + bool instruction_in_flight; > + > /** Updates commit stats based on this instruction. */ > void updateComInstStats(DynInstPtr &inst); > > diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh > --- a/src/cpu/o3/commit_impl.hh > +++ b/src/cpu/o3/commit_impl.hh > @@ -105,7 +105,8 @@ > numThreads(params->numThreads)**, > drainPending(false), > switchedOut(false), > - trapLatency(params->**trapLatency) > + trapLatency(params->**trapLatency), > + instruction_in_flight(false) > { > _status = Active; > _nextStatus = Inactive; > @@ -717,7 +718,7 @@ > > > // Wait until all in flight instructions are finished before enterring > // the interrupt. > - if (cpu->instList.empty()) { > + if (cpu->instList.empty() && (!instruction_in_flight)) { > > // Squash or record that I need to squash this cycle if > // an interrupt needed to be handled. > DPRINTF(Commit, "Interrupt detected.\n"); > @@ -990,6 +991,8 @@ > cpu->instDone(tid); > } > > + instruction_in_flight = !head_inst->isLastMicroop(); > + > // Updates misc. registers. > head_inst->updateMiscRegs(); > > -- > Nilay > > > On Fri, 30 Dec 2011, Nilay Vaish wrote: > > Gabe, from all the traces that I have been through, I never found your >> explanation to be true. Here is an excerpt from your first email in that >> thread -- >> >> ---- >> It turns out the problem is that an instruction is started which returns >> from kernel to user level and is microcoded. The instruction is fetched >> from the kernel's address space successfully and starts to execute, along >> the way dropping down to user mode. Some microops later, there's some >> microop control flow which O3 mispredicts. When it squashes the mispredict >> and tries to restart, it first tries to refetch the instruction involved. >> Since it's now at user level and the instruction is on a kernel level only >> page, there's a page fault and things go downhill from there. >> ---- >> >> You claim that because of branch misprediction, the instruction needs to >> be refetched. I never saw that happening. In all the cases your fix, in >> which the fetch stage picks the current macroop from the just now squashed >> instruction, worked as expected. >> >> What I saw was that an isSquashAfter microop squashes everything and this >> makes the O3 CPU start on handling interrupt. Returning from the interrupt, >> it faults since the CS register had been overwritten by the sysret >> instruction, but not the instruction pointer. >> >> Again, I think we need to change the behavior of isSquashAfter. >> >> -- >> Nilay >> >> On Fri, 30 Dec 2011, Korey Sewell wrote: >> >> I can't vouch for reading all the emails but I have gone through this >>> whole >>> thread (which dates back to Nov. 29th). >>> >>> Also, I'm not all the way familiar with x86 so maybe this excludes me >>> from >>> understanding the problem at the detailed level, but I think I am >>> starting >>> to get a good grasp of the general squashing problem here (basically >>> maintaining squash state through exception events). >>> >>> My concern is that if you don't literally "fix" the problem first, you >>> can >>> get caught up in the minutia of making this big grand sweeping change and >>> then have no good way to say if "the fix" fixes anything in the first >>> place. >>> >>> If Nilay or anyone could get something to the reviewboard that worked, >>> hack >>> or not, then that would be a good step toward making the "clean" change >>> that I think you're referring to Gabe. We dont have to commit the code, >>> but >>> on a 1st pass working is better then "not working", right? :) >>> >>> (Gabe, I do understand it can be frustrating explaining the same things >>> over/over again.) >>> >>> On Fri, Dec 30, 2011 at 3:48 AM, Gabe Black <[email protected]> >>> wrote: >>> >>> If you read my emails the problem would already be identified and >>>> understood, because I did that weeks or even months ago and explained it >>>> multiple times. A hack fix is not ok. A hack fix is why this is still >>>> broken in the first place. That's also something I explained in my >>>> emails. >>>> >>>> Gabe >>>> >>>> On 12/30/11 02:50, Korey Sewell wrote: >>>> >>>>> I agree with you Gabe that the squashing mechanism could be cleaned up. >>>>> >>>>> But I'd also suggest that Nilay should understand/identify the problem >>>>> first and then implement a first-pass fix without a big squashing >>>>> revamp >>>>> (if possible). >>>>> >>>>> That way, when we (nilay, you, me, whoever) gets to revamping the >>>>> squash >>>>> code, there is at least a set test case and trace we can use to debug >>>>> >>>> with.. >>>> >>>>> >>>>> On Fri, Dec 30, 2011 at 2:30 AM, Gabe Black <[email protected]> >>>>> >>>> wrote: >>>> >>>>> >>>>> What was unclear about this email and the ones before it? Did you not >>>>>> believe me for some reason? You've spent about a month partially >>>>>> rediscovering what I explained in them. I've already said how this >>>>>> needs >>>>>> to be fixed. >>>>>> >>>>>> Gabe >>>>>> ______________________________**_________________ >>>>>> gem5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >>>>>> >>>>>> >>>>> >>>>> >>>> ______________________________**_________________ >>>> gem5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >>>> >>>> >>> >>> >>> -- >>> - Korey >>> ______________________________**_________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >>> >>> >> ______________________________**_________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> > -- - Korey _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
