Gabe, I think I now understand the issue better. I think this problem may
still occur even if the prediction was correct. Is it not possible that
only parts of the macro-op were fetched before the mode was switched, in
which case fetch in user mode would still create problem?. I agree with
the solution that you suggested, because in your solution the macro-op
would not be fetched again.

Thanks for all the explanation!
Nilay


On Tue, November 15, 2011 3:45 am, Gabe Black wrote:
> Even then, marking those microops non-speculative wouldn't fix the
> problem anyway. That would make O3 wait until they were at the head of
> the commit queue before executing them, but any branch could still
> easily be mispredicted. The mispredicted instruction wouldn't start
> executing, but that doesn't matter since a squash would still happen and
> fetch would still behave badly. You'd have to eliminate the need for
> branch prediction and memory dependence prediction altogether so that
> you'd never have to squash part of an iret, and that would mean allowing
> only one microop in flight at a time. O3 doesn't know how to do that,
> and if it did it would severely impact performance.
>
> Gabe
>
> On 11/15/11 01:20, Gabe Black wrote:
>> Serializing and being non-speculative are not the same thing and one
>> doesn't imply the other. The properties of the macroop do not apply to
>> all the microops. There's no reason at all to make an add microop in the
>> iret non-speculative. The microops which update state irreverseably are
>> nonspeculative, but being non-speculative doesn't matter here. The
>> microop which changes the mode wasn't misspeculated, it was supposed to
>> execute. In a real CPU, iret or any other instruction complicated enough
>> for internal control flow would probably execute out of the microcode
>> ROM, and then there wouldn't be any need to fetch the instruction again
>> either.
>>
>> Gabe
>>
>> On 11/14/11 15:20, Nilay Vaish wrote:
>>> I checked AMD and Intel's processor manuals. Both state that iret is a
>>> serializing instruction, which means that iret will not be executed
>>> speculatively. I would expect even the micro-ops are executed in a
>>> non-speculative fashion.
>>>
>>> --
>>> Nilay
>>>
>>> On Mon, 14 Nov 2011, Steve Reinhardt wrote:
>>>
>>>> That would be one solution.  It would have some performance cost, but
>>>> depending on how often complex non-speculative macro-instructions get
>>>> executed, it might not be too bad.
>>>>
>>>> Another question is whether it makes sense to dynamically predict
>>>> internal
>>>> micro-branches with the same predictor we use for macro-instruction
>>>> branches.  I honestly don't know how our processors do it, but I
>>>> would not
>>>> be surprised if the dynamic predictor only worked on
>>>> macro-instructions,
>>>> and micro-branches had some static hint bit or something like that.
>>>> That
>>>> doesn't directly affect this bug (since you would still need recovery
>>>> regardless of how you predicted the micro-branch), but this
>>>> discussion does
>>>> make me wonder if our model is realistic.
>>>>
>>>> Steve
>>>>
>>>> On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote:
>>>>
>>>>> Well, I still don't get it. Do out-of-order CPUs speculate on iret?
>>>>> If
>>>>> iret is to be executed non-speculatively, I would expect micro-ops
>>>>> that
>>>>> are part of iret are executed non-speculatively.
>>>>>
>>>>> --
>>>>> Nilay
>>>>>
>>>>> On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote:
>>>>>> Thanks for the more detailed explanation... that helped a lot.
>>>>>> Sounds to
>>>>>> me like you're on the right track.
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]>
>>>>> wrote:
>>>>>>> No, we're not trying to undo anything. An example might help. Lets
>>>>>>> look
>>>>>>> at a dramatically simplified version of iret, the instruction that
>>>>>>> returns from an interrupt handler. The microops might do the
>>>>>>> following.
>>>>>>>
>>>>>>> 1. Restore prior privilege level.
>>>>>>> 2. If we were in kernel level, skip to 4.
>>>>>>> 3. Restore user level stack.
>>>>>>> 4. End.
>>>>>>>
>>>>>>> O3 fetches the bytes that go with iret, decodes that to a macroop,
>>>>>>> and
>>>>>>> starts picking microops out of it. Microop 1 is executed and drops
>>>>>>> to
>>>>>>> user level. Now microop 2 is executed, and O3 misspeculates that
>>>>>>> the
>>>>>>> branch is taken (for example). The mispredict is detected, and
>>>>>>> later
>>>>>>> microops in flight are squashed. O3 then attempts to restart where
>>>>>>> it
>>>>>>> should have gone, microop 3.
>>>>>>>
>>>>>>> Now, O3 looks at the PC involved and starts fetching the bytes
>>>>>>> which
>>>>>>> become the macroop which the microops are pulled from. Because
>>>>>>> microop 1
>>>>>>> successfully completed, the CPU is now at user level, but because
>>>>>>> the
>>>>>>> iret is on a kernel page, it can't be accessed. The kernel gets a
>>>>>>> page
>>>>>>> fault.
>>>>>>>
>>>>>>> As I mentioned before, my partially implemented fix is to not only
>>>>>>> pass
>>>>>>> back the PC, but to also pass back the macroop fetch should use
>>>>>>> instead
>>>>>>> of making it refetch memory. The problem is that it's partially
>>>>>>> implemented, and the way squashes work in O3 make it really tricky
>>>>>>> to
>>>>>>> implement it properly, or to tell whether or not it's implemented
>>>>>>> properly.
>>>>>>>
>>>>>>> Gabe
>>>>>>>
>>>>>>>
>>>>>>> On 11/13/11 19:21, Steve Reinhardt wrote:
>>>>>>>> I'd like to understand the issue a little better before
>>>>>>>> commenting on
>>>>>>> a
>>>>>>>> solution.
>>>>>>>>
>>>>>>>> Gabe, when you say "instruction" in your original description, do
>>>>>>>> you
>>>>>>> mean
>>>>>>>> micro-op?
>>>>>>>>
>>>>>>>> It seems to me that the fundamental problem is that we're trying
>>>>>>>> to
>>>>>>> undo
>>>>>>>> the effects of a non-speculative micro-op, correct?  So the
>>>>>>>> solution
>>>>>>> you're
>>>>>>>> pursuing is that branch mispredictions only roll back to the
>>>>>>>> offending
>>>>>>>> micro-op, and don't force the entire macro-op containing that
>>>>>>>> micro-op
>>>>>>> to
>>>>>>>> re-execute?
>>>>>>>>
>>>>>>>> Is this predicted control flow entirely internal to the
>>>>>>>> macro-op?  Or
>>>>>>> is
>>>>>>>> this an RFI where we are integrating the control transfer and the
>>>>>>> privilege
>>>>>>>> change?  If it is the latter, why does the RFI need to get
>>>>>>>> squashed at
>>>>>>> all?
>>>>>>>> Steve
>>>>>>>>


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to