That would be one solution.  It would have some performance cost, but
depending on how often complex non-speculative macro-instructions get
executed, it might not be too bad.

Another question is whether it makes sense to dynamically predict internal
micro-branches with the same predictor we use for macro-instruction
branches.  I honestly don't know how our processors do it, but I would not
be surprised if the dynamic predictor only worked on macro-instructions,
and micro-branches had some static hint bit or something like that.  That
doesn't directly affect this bug (since you would still need recovery
regardless of how you predicted the micro-branch), but this discussion does
make me wonder if our model is realistic.

Steve

On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote:

> Well, I still don't get it. Do out-of-order CPUs speculate on iret? If
> iret is to be executed non-speculatively, I would expect micro-ops that
> are part of iret are executed non-speculatively.
>
> --
> Nilay
>
> On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote:
> > Thanks for the more detailed explanation... that helped a lot.  Sounds to
> > me like you're on the right track.
> >
> > Steve
> >
> > On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]>
> wrote:
> >
> >> No, we're not trying to undo anything. An example might help. Lets look
> >> at a dramatically simplified version of iret, the instruction that
> >> returns from an interrupt handler. The microops might do the following.
> >>
> >> 1. Restore prior privilege level.
> >> 2. If we were in kernel level, skip to 4.
> >> 3. Restore user level stack.
> >> 4. End.
> >>
> >> O3 fetches the bytes that go with iret, decodes that to a macroop, and
> >> starts picking microops out of it. Microop 1 is executed and drops to
> >> user level. Now microop 2 is executed, and O3 misspeculates that the
> >> branch is taken (for example). The mispredict is detected, and later
> >> microops in flight are squashed. O3 then attempts to restart where it
> >> should have gone, microop 3.
> >>
> >> Now, O3 looks at the PC involved and starts fetching the bytes which
> >> become the macroop which the microops are pulled from. Because microop 1
> >> successfully completed, the CPU is now at user level, but because the
> >> iret is on a kernel page, it can't be accessed. The kernel gets a page
> >> fault.
> >>
> >> As I mentioned before, my partially implemented fix is to not only pass
> >> back the PC, but to also pass back the macroop fetch should use instead
> >> of making it refetch memory. The problem is that it's partially
> >> implemented, and the way squashes work in O3 make it really tricky to
> >> implement it properly, or to tell whether or not it's implemented
> >> properly.
> >>
> >> Gabe
> >>
> >>
> >> On 11/13/11 19:21, Steve Reinhardt wrote:
> >> > I'd like to understand the issue a little better before commenting on
> >> a
> >> > solution.
> >> >
> >> > Gabe, when you say "instruction" in your original description, do you
> >> mean
> >> > micro-op?
> >> >
> >> > It seems to me that the fundamental problem is that we're trying to
> >> undo
> >> > the effects of a non-speculative micro-op, correct?  So the solution
> >> you're
> >> > pursuing is that branch mispredictions only roll back to the offending
> >> > micro-op, and don't force the entire macro-op containing that micro-op
> >> to
> >> > re-execute?
> >> >
> >> > Is this predicted control flow entirely internal to the macro-op?  Or
> >> is
> >> > this an RFI where we are integrating the control transfer and the
> >> privilege
> >> > change?  If it is the latter, why does the RFI need to get squashed at
> >> all?
> >> >
> >> > Steve
> >> >
> >> > On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]>
> >> wrote:
> >> >
> >> >> Yes, this is an existing bug and the branch predictor just pokes
> >> things
> >> >> in the right way to expose it. The macroop isn't passed back in this
> >> >> particular case, and with the code the way it is, it's difficult to
> >> even
> >> >> tell that that's the case, let alone how to fix it. Cleaning things
> >> up
> >> >> won't fix the problem itself, but it will make fixing the actual
> >> problem
> >> >> tractable.
> >> >>
> >> >> Gabe
> >> >>
> >> >> On 11/13/11 16:16, Ali Saidi wrote:
> >> >>> I think this bug is just latently in the code right now and the
> >> branch
> >> >> predictor change runs into it (this patch causes that branch to be
> >> >> mispredicted). In any case I think the issue exists today and it's
> >> just
> >> >> luck that it works currently.
> >> >>> Looking at your list I imagine you should be able to recover most
> >> things
> >> >> from the dyninst, however I don't know if that is actually the case.
> >> >> Excepted that the squashing mechanisms should be cleaned up, I'm not
> >> sure
> >> >> how that is actually going to solve the problem. Don't we currently
> >> send
> >> >> back the instruction? With the current instructions can't you figure
> >> out
> >> >> the macro-op it belongs to?
> >> >>> Ali
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Nov 13, 2011, at 5:40 PM, Gabe Black wrote:
> >> >>>
> >> >>>> Hey folks. Ali has had a change out for a while ("Fix several
> >> Branch
> >> >>>> Predictor issues") which improves branch predictor performance
> >> >>>> substantially but breaks X86_FS on O3. It turns out the problem is
> >> that
> >> >>>> an instruction is started which returns from kernel to user level
> >> and
> >> is
> >> >>>> microcoded. The instruction is fetched from the kernel's address
> >> space
> >> >>>> successfully and starts to execute, along the way dropping down to
> >> user
> >> >>>> mode. Some microops later, there's some microop control flow which
> >> O3
> >> >>>> mispredicts. When it squashes the mispredict and tries to restart,
> >> it
> >> >>>> first tries to refetch the instruction involved. Since it's now at
> >> user
> >> >>>> level and the instruction is on a kernel level only page, there's a
> >> page
> >> >>>> fault and things go downhill from there.
> >> >>>>
> >> >>>> I partially implemented a solution to this before where O3
> >> reinstates
> >> >>>> the macroop it had been using when it restarts fetch. The problem
> >> here
> >> >>>> is that the path this kind of squash takes doesn't pass back the
> >> right
> >> >>>> information, and my attempts to fix that have been unsuccessful.
> >> The
> >> >>>> code that handles squashing in O3 is too complex, there's too much
> >> going
> >> >>>> in all directions, it's not always very clear what affect a change
> >> will
> >> >>>> have in unrelated situations, or which callsites are involved in a
> >> >>>> particular type of fault.
> >> >>>>
> >> >>>> To me, it seems like the first step in fixing this problem is to
> >> clean
> >> >>>> up how squashes are handled in O3 so that they can be made to
> >> >>>> consistently handle squashes in non-restartable macroops.
> >> >>>>
> >> >>>> Without having really dug into the specifics, I think we only need
> >> two
> >> >>>> pieces of information when squashing, a pointer to the guilty
> >> >>>> instruction and whether execution should start at or after it. It
> >> would
> >> >>>> start at it if the instruction needed to be reexecuted due to a
> >> memory
> >> >>>> dependence violation, for instance, and would start after it for
> >> faults,
> >> >>>> interrupts, or branch mispredicts. Any other information that's
> >> needed
> >> >>>> like sequence numbers or actual control flow targets can be
> >> retrieved
> >> >>>> from the instructions where needed without having to split
> >> everything
> >> >>>> out and pass them around individually.
> >> >>>>
> >> >>>> Is there any obvious problem with doing things this way? I don't
> >> think
> >> >>>> I'll personally have a lot of time to dedicate to this at the very
> >> least
> >> >>>> in the short term, but I wanted to get the conversation going so we
> >> know
> >> >>>> what to do when somebody has a chance to do it.
> >> >>>>
> >> >>>> Gabe
>
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to