Serializing and being non-speculative are not the same thing and one doesn't imply the other. The properties of the macroop do not apply to all the microops. There's no reason at all to make an add microop in the iret non-speculative. The microops which update state irreverseably are nonspeculative, but being non-speculative doesn't matter here. The microop which changes the mode wasn't misspeculated, it was supposed to execute. In a real CPU, iret or any other instruction complicated enough for internal control flow would probably execute out of the microcode ROM, and then there wouldn't be any need to fetch the instruction again either.
Gabe On 11/14/11 15:20, Nilay Vaish wrote: > I checked AMD and Intel's processor manuals. Both state that iret is a > serializing instruction, which means that iret will not be executed > speculatively. I would expect even the micro-ops are executed in a > non-speculative fashion. > > -- > Nilay > > On Mon, 14 Nov 2011, Steve Reinhardt wrote: > >> That would be one solution. It would have some performance cost, but >> depending on how often complex non-speculative macro-instructions get >> executed, it might not be too bad. >> >> Another question is whether it makes sense to dynamically predict >> internal >> micro-branches with the same predictor we use for macro-instruction >> branches. I honestly don't know how our processors do it, but I >> would not >> be surprised if the dynamic predictor only worked on macro-instructions, >> and micro-branches had some static hint bit or something like that. >> That >> doesn't directly affect this bug (since you would still need recovery >> regardless of how you predicted the micro-branch), but this >> discussion does >> make me wonder if our model is realistic. >> >> Steve >> >> On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote: >> >>> Well, I still don't get it. Do out-of-order CPUs speculate on iret? If >>> iret is to be executed non-speculatively, I would expect micro-ops that >>> are part of iret are executed non-speculatively. >>> >>> -- >>> Nilay >>> >>> On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote: >>>> Thanks for the more detailed explanation... that helped a lot. >>>> Sounds to >>>> me like you're on the right track. >>>> >>>> Steve >>>> >>>> On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]> >>> wrote: >>>> >>>>> No, we're not trying to undo anything. An example might help. Lets >>>>> look >>>>> at a dramatically simplified version of iret, the instruction that >>>>> returns from an interrupt handler. The microops might do the >>>>> following. >>>>> >>>>> 1. Restore prior privilege level. >>>>> 2. If we were in kernel level, skip to 4. >>>>> 3. Restore user level stack. >>>>> 4. End. >>>>> >>>>> O3 fetches the bytes that go with iret, decodes that to a macroop, >>>>> and >>>>> starts picking microops out of it. Microop 1 is executed and drops to >>>>> user level. Now microop 2 is executed, and O3 misspeculates that the >>>>> branch is taken (for example). The mispredict is detected, and later >>>>> microops in flight are squashed. O3 then attempts to restart where it >>>>> should have gone, microop 3. >>>>> >>>>> Now, O3 looks at the PC involved and starts fetching the bytes which >>>>> become the macroop which the microops are pulled from. Because >>>>> microop 1 >>>>> successfully completed, the CPU is now at user level, but because the >>>>> iret is on a kernel page, it can't be accessed. The kernel gets a >>>>> page >>>>> fault. >>>>> >>>>> As I mentioned before, my partially implemented fix is to not only >>>>> pass >>>>> back the PC, but to also pass back the macroop fetch should use >>>>> instead >>>>> of making it refetch memory. The problem is that it's partially >>>>> implemented, and the way squashes work in O3 make it really tricky to >>>>> implement it properly, or to tell whether or not it's implemented >>>>> properly. >>>>> >>>>> Gabe >>>>> >>>>> >>>>> On 11/13/11 19:21, Steve Reinhardt wrote: >>>>>> I'd like to understand the issue a little better before >>>>>> commenting on >>>>> a >>>>>> solution. >>>>>> >>>>>> Gabe, when you say "instruction" in your original description, do >>>>>> you >>>>> mean >>>>>> micro-op? >>>>>> >>>>>> It seems to me that the fundamental problem is that we're trying to >>>>> undo >>>>>> the effects of a non-speculative micro-op, correct? So the solution >>>>> you're >>>>>> pursuing is that branch mispredictions only roll back to the >>>>>> offending >>>>>> micro-op, and don't force the entire macro-op containing that >>>>>> micro-op >>>>> to >>>>>> re-execute? >>>>>> >>>>>> Is this predicted control flow entirely internal to the >>>>>> macro-op? Or >>>>> is >>>>>> this an RFI where we are integrating the control transfer and the >>>>> privilege >>>>>> change? If it is the latter, why does the RFI need to get >>>>>> squashed at >>>>> all? >>>>>> >>>>>> Steve >>>>>> >>>>>> On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]> >>>>> wrote: >>>>>> >>>>>>> Yes, this is an existing bug and the branch predictor just pokes >>>>> things >>>>>>> in the right way to expose it. The macroop isn't passed back in >>>>>>> this >>>>>>> particular case, and with the code the way it is, it's difficult to >>>>> even >>>>>>> tell that that's the case, let alone how to fix it. Cleaning things >>>>> up >>>>>>> won't fix the problem itself, but it will make fixing the actual >>>>> problem >>>>>>> tractable. >>>>>>> >>>>>>> Gabe >>>>>>> >>>>>>> On 11/13/11 16:16, Ali Saidi wrote: >>>>>>>> I think this bug is just latently in the code right now and the >>>>> branch >>>>>>> predictor change runs into it (this patch causes that branch to be >>>>>>> mispredicted). In any case I think the issue exists today and it's >>>>> just >>>>>>> luck that it works currently. >>>>>>>> Looking at your list I imagine you should be able to recover most >>>>> things >>>>>>> from the dyninst, however I don't know if that is actually the >>>>>>> case. >>>>>>> Excepted that the squashing mechanisms should be cleaned up, I'm >>>>>>> not >>>>> sure >>>>>>> how that is actually going to solve the problem. Don't we currently >>>>> send >>>>>>> back the instruction? With the current instructions can't you >>>>>>> figure >>>>> out >>>>>>> the macro-op it belongs to? >>>>>>>> Ali >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Nov 13, 2011, at 5:40 PM, Gabe Black wrote: >>>>>>>> >>>>>>>>> Hey folks. Ali has had a change out for a while ("Fix several >>>>> Branch >>>>>>>>> Predictor issues") which improves branch predictor performance >>>>>>>>> substantially but breaks X86_FS on O3. It turns out the >>>>>>>>> problem is >>>>> that >>>>>>>>> an instruction is started which returns from kernel to user level >>>>> and >>>>> is >>>>>>>>> microcoded. The instruction is fetched from the kernel's address >>>>> space >>>>>>>>> successfully and starts to execute, along the way dropping >>>>>>>>> down to >>>>> user >>>>>>>>> mode. Some microops later, there's some microop control flow >>>>>>>>> which >>>>> O3 >>>>>>>>> mispredicts. When it squashes the mispredict and tries to >>>>>>>>> restart, >>>>> it >>>>>>>>> first tries to refetch the instruction involved. Since it's >>>>>>>>> now at >>>>> user >>>>>>>>> level and the instruction is on a kernel level only page, >>>>>>>>> there's a >>>>> page >>>>>>>>> fault and things go downhill from there. >>>>>>>>> >>>>>>>>> I partially implemented a solution to this before where O3 >>>>> reinstates >>>>>>>>> the macroop it had been using when it restarts fetch. The problem >>>>> here >>>>>>>>> is that the path this kind of squash takes doesn't pass back the >>>>> right >>>>>>>>> information, and my attempts to fix that have been unsuccessful. >>>>> The >>>>>>>>> code that handles squashing in O3 is too complex, there's too >>>>>>>>> much >>>>> going >>>>>>>>> in all directions, it's not always very clear what affect a >>>>>>>>> change >>>>> will >>>>>>>>> have in unrelated situations, or which callsites are involved >>>>>>>>> in a >>>>>>>>> particular type of fault. >>>>>>>>> >>>>>>>>> To me, it seems like the first step in fixing this problem is to >>>>> clean >>>>>>>>> up how squashes are handled in O3 so that they can be made to >>>>>>>>> consistently handle squashes in non-restartable macroops. >>>>>>>>> >>>>>>>>> Without having really dug into the specifics, I think we only >>>>>>>>> need >>>>> two >>>>>>>>> pieces of information when squashing, a pointer to the guilty >>>>>>>>> instruction and whether execution should start at or after it. It >>>>> would >>>>>>>>> start at it if the instruction needed to be reexecuted due to a >>>>> memory >>>>>>>>> dependence violation, for instance, and would start after it for >>>>> faults, >>>>>>>>> interrupts, or branch mispredicts. Any other information that's >>>>> needed >>>>>>>>> like sequence numbers or actual control flow targets can be >>>>> retrieved >>>>>>>>> from the instructions where needed without having to split >>>>> everything >>>>>>>>> out and pass them around individually. >>>>>>>>> >>>>>>>>> Is there any obvious problem with doing things this way? I don't >>>>> think >>>>>>>>> I'll personally have a lot of time to dedicate to this at the >>>>>>>>> very >>>>> least >>>>>>>>> in the short term, but I wanted to get the conversation going >>>>>>>>> so we >>>>> know >>>>>>>>> what to do when somebody has a chance to do it. >>>>>>>>> >>>>>>>>> Gabe >>> >>> >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >>> >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev >> > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
