Well, I still don't get it. Do out-of-order CPUs speculate on iret? If iret is to be executed non-speculatively, I would expect micro-ops that are part of iret are executed non-speculatively.
-- Nilay On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote: > Thanks for the more detailed explanation... that helped a lot. Sounds to > me like you're on the right track. > > Steve > > On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]> wrote: > >> No, we're not trying to undo anything. An example might help. Lets look >> at a dramatically simplified version of iret, the instruction that >> returns from an interrupt handler. The microops might do the following. >> >> 1. Restore prior privilege level. >> 2. If we were in kernel level, skip to 4. >> 3. Restore user level stack. >> 4. End. >> >> O3 fetches the bytes that go with iret, decodes that to a macroop, and >> starts picking microops out of it. Microop 1 is executed and drops to >> user level. Now microop 2 is executed, and O3 misspeculates that the >> branch is taken (for example). The mispredict is detected, and later >> microops in flight are squashed. O3 then attempts to restart where it >> should have gone, microop 3. >> >> Now, O3 looks at the PC involved and starts fetching the bytes which >> become the macroop which the microops are pulled from. Because microop 1 >> successfully completed, the CPU is now at user level, but because the >> iret is on a kernel page, it can't be accessed. The kernel gets a page >> fault. >> >> As I mentioned before, my partially implemented fix is to not only pass >> back the PC, but to also pass back the macroop fetch should use instead >> of making it refetch memory. The problem is that it's partially >> implemented, and the way squashes work in O3 make it really tricky to >> implement it properly, or to tell whether or not it's implemented >> properly. >> >> Gabe >> >> >> On 11/13/11 19:21, Steve Reinhardt wrote: >> > I'd like to understand the issue a little better before commenting on >> a >> > solution. >> > >> > Gabe, when you say "instruction" in your original description, do you >> mean >> > micro-op? >> > >> > It seems to me that the fundamental problem is that we're trying to >> undo >> > the effects of a non-speculative micro-op, correct? So the solution >> you're >> > pursuing is that branch mispredictions only roll back to the offending >> > micro-op, and don't force the entire macro-op containing that micro-op >> to >> > re-execute? >> > >> > Is this predicted control flow entirely internal to the macro-op? Or >> is >> > this an RFI where we are integrating the control transfer and the >> privilege >> > change? If it is the latter, why does the RFI need to get squashed at >> all? >> > >> > Steve >> > >> > On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]> >> wrote: >> > >> >> Yes, this is an existing bug and the branch predictor just pokes >> things >> >> in the right way to expose it. The macroop isn't passed back in this >> >> particular case, and with the code the way it is, it's difficult to >> even >> >> tell that that's the case, let alone how to fix it. Cleaning things >> up >> >> won't fix the problem itself, but it will make fixing the actual >> problem >> >> tractable. >> >> >> >> Gabe >> >> >> >> On 11/13/11 16:16, Ali Saidi wrote: >> >>> I think this bug is just latently in the code right now and the >> branch >> >> predictor change runs into it (this patch causes that branch to be >> >> mispredicted). In any case I think the issue exists today and it's >> just >> >> luck that it works currently. >> >>> Looking at your list I imagine you should be able to recover most >> things >> >> from the dyninst, however I don't know if that is actually the case. >> >> Excepted that the squashing mechanisms should be cleaned up, I'm not >> sure >> >> how that is actually going to solve the problem. Don't we currently >> send >> >> back the instruction? With the current instructions can't you figure >> out >> >> the macro-op it belongs to? >> >>> Ali >> >>> >> >>> >> >>> >> >>> On Nov 13, 2011, at 5:40 PM, Gabe Black wrote: >> >>> >> >>>> Hey folks. Ali has had a change out for a while ("Fix several >> Branch >> >>>> Predictor issues") which improves branch predictor performance >> >>>> substantially but breaks X86_FS on O3. It turns out the problem is >> that >> >>>> an instruction is started which returns from kernel to user level >> and >> is >> >>>> microcoded. The instruction is fetched from the kernel's address >> space >> >>>> successfully and starts to execute, along the way dropping down to >> user >> >>>> mode. Some microops later, there's some microop control flow which >> O3 >> >>>> mispredicts. When it squashes the mispredict and tries to restart, >> it >> >>>> first tries to refetch the instruction involved. Since it's now at >> user >> >>>> level and the instruction is on a kernel level only page, there's a >> page >> >>>> fault and things go downhill from there. >> >>>> >> >>>> I partially implemented a solution to this before where O3 >> reinstates >> >>>> the macroop it had been using when it restarts fetch. The problem >> here >> >>>> is that the path this kind of squash takes doesn't pass back the >> right >> >>>> information, and my attempts to fix that have been unsuccessful. >> The >> >>>> code that handles squashing in O3 is too complex, there's too much >> going >> >>>> in all directions, it's not always very clear what affect a change >> will >> >>>> have in unrelated situations, or which callsites are involved in a >> >>>> particular type of fault. >> >>>> >> >>>> To me, it seems like the first step in fixing this problem is to >> clean >> >>>> up how squashes are handled in O3 so that they can be made to >> >>>> consistently handle squashes in non-restartable macroops. >> >>>> >> >>>> Without having really dug into the specifics, I think we only need >> two >> >>>> pieces of information when squashing, a pointer to the guilty >> >>>> instruction and whether execution should start at or after it. It >> would >> >>>> start at it if the instruction needed to be reexecuted due to a >> memory >> >>>> dependence violation, for instance, and would start after it for >> faults, >> >>>> interrupts, or branch mispredicts. Any other information that's >> needed >> >>>> like sequence numbers or actual control flow targets can be >> retrieved >> >>>> from the instructions where needed without having to split >> everything >> >>>> out and pass them around individually. >> >>>> >> >>>> Is there any obvious problem with doing things this way? I don't >> think >> >>>> I'll personally have a lot of time to dedicate to this at the very >> least >> >>>> in the short term, but I wanted to get the conversation going so we >> know >> >>>> what to do when somebody has a chance to do it. >> >>>> >> >>>> Gabe _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
