Gabe, I think I now understand the issue better. I think this problem may still occur even if the prediction was correct. Is it not possible that only parts of the macro-op were fetched before the mode was switched, in which case fetch in user mode would still create problem?. I agree with the solution that you suggested, because in your solution the macro-op would not be fetched again.
Thanks for all the explanation! Nilay On Tue, November 15, 2011 3:45 am, Gabe Black wrote: > Even then, marking those microops non-speculative wouldn't fix the > problem anyway. That would make O3 wait until they were at the head of > the commit queue before executing them, but any branch could still > easily be mispredicted. The mispredicted instruction wouldn't start > executing, but that doesn't matter since a squash would still happen and > fetch would still behave badly. You'd have to eliminate the need for > branch prediction and memory dependence prediction altogether so that > you'd never have to squash part of an iret, and that would mean allowing > only one microop in flight at a time. O3 doesn't know how to do that, > and if it did it would severely impact performance. > > Gabe > > On 11/15/11 01:20, Gabe Black wrote: >> Serializing and being non-speculative are not the same thing and one >> doesn't imply the other. The properties of the macroop do not apply to >> all the microops. There's no reason at all to make an add microop in the >> iret non-speculative. The microops which update state irreverseably are >> nonspeculative, but being non-speculative doesn't matter here. The >> microop which changes the mode wasn't misspeculated, it was supposed to >> execute. In a real CPU, iret or any other instruction complicated enough >> for internal control flow would probably execute out of the microcode >> ROM, and then there wouldn't be any need to fetch the instruction again >> either. >> >> Gabe >> >> On 11/14/11 15:20, Nilay Vaish wrote: >>> I checked AMD and Intel's processor manuals. Both state that iret is a >>> serializing instruction, which means that iret will not be executed >>> speculatively. I would expect even the micro-ops are executed in a >>> non-speculative fashion. >>> >>> -- >>> Nilay >>> >>> On Mon, 14 Nov 2011, Steve Reinhardt wrote: >>> >>>> That would be one solution. It would have some performance cost, but >>>> depending on how often complex non-speculative macro-instructions get >>>> executed, it might not be too bad. >>>> >>>> Another question is whether it makes sense to dynamically predict >>>> internal >>>> micro-branches with the same predictor we use for macro-instruction >>>> branches. I honestly don't know how our processors do it, but I >>>> would not >>>> be surprised if the dynamic predictor only worked on >>>> macro-instructions, >>>> and micro-branches had some static hint bit or something like that. >>>> That >>>> doesn't directly affect this bug (since you would still need recovery >>>> regardless of how you predicted the micro-branch), but this >>>> discussion does >>>> make me wonder if our model is realistic. >>>> >>>> Steve >>>> >>>> On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote: >>>> >>>>> Well, I still don't get it. Do out-of-order CPUs speculate on iret? >>>>> If >>>>> iret is to be executed non-speculatively, I would expect micro-ops >>>>> that >>>>> are part of iret are executed non-speculatively. >>>>> >>>>> -- >>>>> Nilay >>>>> >>>>> On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote: >>>>>> Thanks for the more detailed explanation... that helped a lot. >>>>>> Sounds to >>>>>> me like you're on the right track. >>>>>> >>>>>> Steve >>>>>> >>>>>> On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]> >>>>> wrote: >>>>>>> No, we're not trying to undo anything. An example might help. Lets >>>>>>> look >>>>>>> at a dramatically simplified version of iret, the instruction that >>>>>>> returns from an interrupt handler. The microops might do the >>>>>>> following. >>>>>>> >>>>>>> 1. Restore prior privilege level. >>>>>>> 2. If we were in kernel level, skip to 4. >>>>>>> 3. Restore user level stack. >>>>>>> 4. End. >>>>>>> >>>>>>> O3 fetches the bytes that go with iret, decodes that to a macroop, >>>>>>> and >>>>>>> starts picking microops out of it. Microop 1 is executed and drops >>>>>>> to >>>>>>> user level. Now microop 2 is executed, and O3 misspeculates that >>>>>>> the >>>>>>> branch is taken (for example). The mispredict is detected, and >>>>>>> later >>>>>>> microops in flight are squashed. O3 then attempts to restart where >>>>>>> it >>>>>>> should have gone, microop 3. >>>>>>> >>>>>>> Now, O3 looks at the PC involved and starts fetching the bytes >>>>>>> which >>>>>>> become the macroop which the microops are pulled from. Because >>>>>>> microop 1 >>>>>>> successfully completed, the CPU is now at user level, but because >>>>>>> the >>>>>>> iret is on a kernel page, it can't be accessed. The kernel gets a >>>>>>> page >>>>>>> fault. >>>>>>> >>>>>>> As I mentioned before, my partially implemented fix is to not only >>>>>>> pass >>>>>>> back the PC, but to also pass back the macroop fetch should use >>>>>>> instead >>>>>>> of making it refetch memory. The problem is that it's partially >>>>>>> implemented, and the way squashes work in O3 make it really tricky >>>>>>> to >>>>>>> implement it properly, or to tell whether or not it's implemented >>>>>>> properly. >>>>>>> >>>>>>> Gabe >>>>>>> >>>>>>> >>>>>>> On 11/13/11 19:21, Steve Reinhardt wrote: >>>>>>>> I'd like to understand the issue a little better before >>>>>>>> commenting on >>>>>>> a >>>>>>>> solution. >>>>>>>> >>>>>>>> Gabe, when you say "instruction" in your original description, do >>>>>>>> you >>>>>>> mean >>>>>>>> micro-op? >>>>>>>> >>>>>>>> It seems to me that the fundamental problem is that we're trying >>>>>>>> to >>>>>>> undo >>>>>>>> the effects of a non-speculative micro-op, correct? So the >>>>>>>> solution >>>>>>> you're >>>>>>>> pursuing is that branch mispredictions only roll back to the >>>>>>>> offending >>>>>>>> micro-op, and don't force the entire macro-op containing that >>>>>>>> micro-op >>>>>>> to >>>>>>>> re-execute? >>>>>>>> >>>>>>>> Is this predicted control flow entirely internal to the >>>>>>>> macro-op? Or >>>>>>> is >>>>>>>> this an RFI where we are integrating the control transfer and the >>>>>>> privilege >>>>>>>> change? If it is the latter, why does the RFI need to get >>>>>>>> squashed at >>>>>>> all? >>>>>>>> Steve >>>>>>>> _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
