So your fix is basically to not allow a interrupt to happen in between a
macroop since that leads to imprecise state and the current problem? If so,
I think that's reasonable and definitely it's good work on your part!
Persistence pays off.

Although, I think there is probably a more descriptive name than
"instruction in flight" and also someone may have a better way to
encapsulate this behavior.

I'd suggest posting this to the reviewboard for further comments on this
patch.

On Fri, Dec 30, 2011 at 4:16 PM, Nilay Vaish <[email protected]> wrote:

> The following patch takes care of the current problem. I am able to
> replicate the O3 full system regression test with Ruby in place of the
> classic memory system.
>
> diff --git a/src/cpu/o3/commit.hh b/src/cpu/o3/commit.hh
> --- a/src/cpu/o3/commit.hh
> +++ b/src/cpu/o3/commit.hh
> @@ -443,6 +443,9 @@
>     /** Rename map interface. */
>     RenameMap *renameMap[Impl::MaxThreads];
>
> +    /* Variable for keeping track if an instruction is in middle of
> execution */
> +    bool instruction_in_flight;
> +
>     /** Updates commit stats based on this instruction. */
>     void updateComInstStats(DynInstPtr &inst);
>
> diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
> --- a/src/cpu/o3/commit_impl.hh
> +++ b/src/cpu/o3/commit_impl.hh
> @@ -105,7 +105,8 @@
>       numThreads(params->numThreads)**,
>       drainPending(false),
>       switchedOut(false),
> -      trapLatency(params->**trapLatency)
> +      trapLatency(params->**trapLatency),
> +      instruction_in_flight(false)
>  {
>     _status = Active;
>     _nextStatus = Inactive;
> @@ -717,7 +718,7 @@
>
>
>     // Wait until all in flight instructions are finished before enterring
>     // the interrupt.
> -    if (cpu->instList.empty()) {
> +    if (cpu->instList.empty() && (!instruction_in_flight)) {
>
>         // Squash or record that I need to squash this cycle if
>         // an interrupt needed to be handled.
>         DPRINTF(Commit, "Interrupt detected.\n");
> @@ -990,6 +991,8 @@
>                     cpu->instDone(tid);
>                 }
>
> +                instruction_in_flight = !head_inst->isLastMicroop();
> +
>                 // Updates misc. registers.
>                 head_inst->updateMiscRegs();
>
> --
> Nilay
>
>
> On Fri, 30 Dec 2011, Nilay Vaish wrote:
>
>  Gabe, from all the traces that I have been through, I never found your
>> explanation to be true. Here is an excerpt from your first email in that
>> thread --
>>
>> ----
>> It turns out the problem is that an instruction is started which returns
>> from kernel to user level and is microcoded. The instruction is fetched
>> from the kernel's address space successfully and starts to execute, along
>> the way dropping down to user mode. Some microops later, there's some
>> microop control flow which O3 mispredicts. When it squashes the mispredict
>> and tries to restart, it first tries to refetch the instruction involved.
>> Since it's now at user level and the instruction is on a kernel level only
>> page, there's a page fault and things go downhill from there.
>> ----
>>
>> You claim that because of branch misprediction, the instruction needs to
>> be refetched. I never saw that happening. In all the cases your fix, in
>> which the fetch stage picks the current macroop from the just now squashed
>> instruction, worked as expected.
>>
>> What I saw was that an isSquashAfter microop squashes everything and this
>> makes the O3 CPU start on handling interrupt. Returning from the interrupt,
>> it faults since the CS register had been overwritten by the sysret
>> instruction, but not the instruction pointer.
>>
>> Again, I think we need to change the behavior of isSquashAfter.
>>
>> --
>> Nilay
>>
>> On Fri, 30 Dec 2011, Korey Sewell wrote:
>>
>>  I can't vouch for reading all the emails but I have gone through this
>>> whole
>>> thread (which dates back to Nov. 29th).
>>>
>>> Also, I'm not all the way familiar with x86 so maybe this excludes me
>>> from
>>> understanding the problem at the detailed level, but I think I am
>>> starting
>>> to get a good grasp of the general squashing problem here (basically
>>> maintaining squash state through exception events).
>>>
>>> My concern is that if you don't literally "fix" the problem first, you
>>> can
>>> get caught up in the minutia of making this big grand sweeping change and
>>> then have no good way to say if "the fix" fixes anything in the first
>>> place.
>>>
>>> If Nilay or anyone could get something to the reviewboard that worked,
>>> hack
>>> or not, then that would be a good step toward making the "clean" change
>>> that I think you're referring to Gabe. We dont have to commit the code,
>>> but
>>> on a 1st pass working is better then "not working", right? :)
>>>
>>> (Gabe, I do understand it can be frustrating explaining the same things
>>> over/over again.)
>>>
>>> On Fri, Dec 30, 2011 at 3:48 AM, Gabe Black <[email protected]>
>>> wrote:
>>>
>>>  If you read my emails the problem would already be identified and
>>>> understood, because I did that weeks or even months ago and explained it
>>>> multiple times. A hack fix is not ok. A hack fix is why this is still
>>>> broken in the first place. That's also something I explained in my
>>>> emails.
>>>>
>>>> Gabe
>>>>
>>>> On 12/30/11 02:50, Korey Sewell wrote:
>>>>
>>>>> I agree with you Gabe that the squashing mechanism could be cleaned up.
>>>>>
>>>>> But I'd also suggest that Nilay should understand/identify the problem
>>>>> first and then implement a first-pass fix without a big squashing
>>>>> revamp
>>>>> (if possible).
>>>>>
>>>>> That way, when we (nilay, you, me, whoever) gets to revamping the
>>>>> squash
>>>>> code, there is at least a set test case and trace we can use to debug
>>>>>
>>>> with..
>>>>
>>>>>
>>>>> On Fri, Dec 30, 2011 at 2:30 AM, Gabe Black <[email protected]>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  What was unclear about this email and the ones before it? Did you not
>>>>>> believe me for some reason? You've spent about a month partially
>>>>>> rediscovering what I explained in them. I've already said how this
>>>>>> needs
>>>>>> to be fixed.
>>>>>>
>>>>>> Gabe
>>>>>> ______________________________**_________________
>>>>>> gem5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>> ______________________________**_________________
>>>> gem5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>>>
>>>>
>>>
>>>
>>> --
>>> - Korey
>>> ______________________________**_________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>>
>>>
>>  ______________________________**_________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>



-- 
- Korey
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to