Well, I still don't get it. Do out-of-order CPUs speculate on iret? If
iret is to be executed non-speculatively, I would expect micro-ops that
are part of iret are executed non-speculatively.

--
Nilay

On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote:
> Thanks for the more detailed explanation... that helped a lot.  Sounds to
> me like you're on the right track.
>
> Steve
>
> On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]> wrote:
>
>> No, we're not trying to undo anything. An example might help. Lets look
>> at a dramatically simplified version of iret, the instruction that
>> returns from an interrupt handler. The microops might do the following.
>>
>> 1. Restore prior privilege level.
>> 2. If we were in kernel level, skip to 4.
>> 3. Restore user level stack.
>> 4. End.
>>
>> O3 fetches the bytes that go with iret, decodes that to a macroop, and
>> starts picking microops out of it. Microop 1 is executed and drops to
>> user level. Now microop 2 is executed, and O3 misspeculates that the
>> branch is taken (for example). The mispredict is detected, and later
>> microops in flight are squashed. O3 then attempts to restart where it
>> should have gone, microop 3.
>>
>> Now, O3 looks at the PC involved and starts fetching the bytes which
>> become the macroop which the microops are pulled from. Because microop 1
>> successfully completed, the CPU is now at user level, but because the
>> iret is on a kernel page, it can't be accessed. The kernel gets a page
>> fault.
>>
>> As I mentioned before, my partially implemented fix is to not only pass
>> back the PC, but to also pass back the macroop fetch should use instead
>> of making it refetch memory. The problem is that it's partially
>> implemented, and the way squashes work in O3 make it really tricky to
>> implement it properly, or to tell whether or not it's implemented
>> properly.
>>
>> Gabe
>>
>>
>> On 11/13/11 19:21, Steve Reinhardt wrote:
>> > I'd like to understand the issue a little better before commenting on
>> a
>> > solution.
>> >
>> > Gabe, when you say "instruction" in your original description, do you
>> mean
>> > micro-op?
>> >
>> > It seems to me that the fundamental problem is that we're trying to
>> undo
>> > the effects of a non-speculative micro-op, correct?  So the solution
>> you're
>> > pursuing is that branch mispredictions only roll back to the offending
>> > micro-op, and don't force the entire macro-op containing that micro-op
>> to
>> > re-execute?
>> >
>> > Is this predicted control flow entirely internal to the macro-op?  Or
>> is
>> > this an RFI where we are integrating the control transfer and the
>> privilege
>> > change?  If it is the latter, why does the RFI need to get squashed at
>> all?
>> >
>> > Steve
>> >
>> > On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]>
>> wrote:
>> >
>> >> Yes, this is an existing bug and the branch predictor just pokes
>> things
>> >> in the right way to expose it. The macroop isn't passed back in this
>> >> particular case, and with the code the way it is, it's difficult to
>> even
>> >> tell that that's the case, let alone how to fix it. Cleaning things
>> up
>> >> won't fix the problem itself, but it will make fixing the actual
>> problem
>> >> tractable.
>> >>
>> >> Gabe
>> >>
>> >> On 11/13/11 16:16, Ali Saidi wrote:
>> >>> I think this bug is just latently in the code right now and the
>> branch
>> >> predictor change runs into it (this patch causes that branch to be
>> >> mispredicted). In any case I think the issue exists today and it's
>> just
>> >> luck that it works currently.
>> >>> Looking at your list I imagine you should be able to recover most
>> things
>> >> from the dyninst, however I don't know if that is actually the case.
>> >> Excepted that the squashing mechanisms should be cleaned up, I'm not
>> sure
>> >> how that is actually going to solve the problem. Don't we currently
>> send
>> >> back the instruction? With the current instructions can't you figure
>> out
>> >> the macro-op it belongs to?
>> >>> Ali
>> >>>
>> >>>
>> >>>
>> >>> On Nov 13, 2011, at 5:40 PM, Gabe Black wrote:
>> >>>
>> >>>> Hey folks. Ali has had a change out for a while ("Fix several
>> Branch
>> >>>> Predictor issues") which improves branch predictor performance
>> >>>> substantially but breaks X86_FS on O3. It turns out the problem is
>> that
>> >>>> an instruction is started which returns from kernel to user level
>> and
>> is
>> >>>> microcoded. The instruction is fetched from the kernel's address
>> space
>> >>>> successfully and starts to execute, along the way dropping down to
>> user
>> >>>> mode. Some microops later, there's some microop control flow which
>> O3
>> >>>> mispredicts. When it squashes the mispredict and tries to restart,
>> it
>> >>>> first tries to refetch the instruction involved. Since it's now at
>> user
>> >>>> level and the instruction is on a kernel level only page, there's a
>> page
>> >>>> fault and things go downhill from there.
>> >>>>
>> >>>> I partially implemented a solution to this before where O3
>> reinstates
>> >>>> the macroop it had been using when it restarts fetch. The problem
>> here
>> >>>> is that the path this kind of squash takes doesn't pass back the
>> right
>> >>>> information, and my attempts to fix that have been unsuccessful.
>> The
>> >>>> code that handles squashing in O3 is too complex, there's too much
>> going
>> >>>> in all directions, it's not always very clear what affect a change
>> will
>> >>>> have in unrelated situations, or which callsites are involved in a
>> >>>> particular type of fault.
>> >>>>
>> >>>> To me, it seems like the first step in fixing this problem is to
>> clean
>> >>>> up how squashes are handled in O3 so that they can be made to
>> >>>> consistently handle squashes in non-restartable macroops.
>> >>>>
>> >>>> Without having really dug into the specifics, I think we only need
>> two
>> >>>> pieces of information when squashing, a pointer to the guilty
>> >>>> instruction and whether execution should start at or after it. It
>> would
>> >>>> start at it if the instruction needed to be reexecuted due to a
>> memory
>> >>>> dependence violation, for instance, and would start after it for
>> faults,
>> >>>> interrupts, or branch mispredicts. Any other information that's
>> needed
>> >>>> like sequence numbers or actual control flow targets can be
>> retrieved
>> >>>> from the instructions where needed without having to split
>> everything
>> >>>> out and pass them around individually.
>> >>>>
>> >>>> Is there any obvious problem with doing things this way? I don't
>> think
>> >>>> I'll personally have a lot of time to dedicate to this at the very
>> least
>> >>>> in the short term, but I wanted to get the conversation going so we
>> know
>> >>>> what to do when somebody has a chance to do it.
>> >>>>
>> >>>> Gabe


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to