No, we're not trying to undo anything. An example might help. Lets look at a dramatically simplified version of iret, the instruction that returns from an interrupt handler. The microops might do the following.
1. Restore prior privilege level. 2. If we were in kernel level, skip to 4. 3. Restore user level stack. 4. End. O3 fetches the bytes that go with iret, decodes that to a macroop, and starts picking microops out of it. Microop 1 is executed and drops to user level. Now microop 2 is executed, and O3 misspeculates that the branch is taken (for example). The mispredict is detected, and later microops in flight are squashed. O3 then attempts to restart where it should have gone, microop 3. Now, O3 looks at the PC involved and starts fetching the bytes which become the macroop which the microops are pulled from. Because microop 1 successfully completed, the CPU is now at user level, but because the iret is on a kernel page, it can't be accessed. The kernel gets a page fault. As I mentioned before, my partially implemented fix is to not only pass back the PC, but to also pass back the macroop fetch should use instead of making it refetch memory. The problem is that it's partially implemented, and the way squashes work in O3 make it really tricky to implement it properly, or to tell whether or not it's implemented properly. Gabe On 11/13/11 19:21, Steve Reinhardt wrote: > I'd like to understand the issue a little better before commenting on a > solution. > > Gabe, when you say "instruction" in your original description, do you mean > micro-op? > > It seems to me that the fundamental problem is that we're trying to undo > the effects of a non-speculative micro-op, correct? So the solution you're > pursuing is that branch mispredictions only roll back to the offending > micro-op, and don't force the entire macro-op containing that micro-op to > re-execute? > > Is this predicted control flow entirely internal to the macro-op? Or is > this an RFI where we are integrating the control transfer and the privilege > change? If it is the latter, why does the RFI need to get squashed at all? > > Steve > > On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]> wrote: > >> Yes, this is an existing bug and the branch predictor just pokes things >> in the right way to expose it. The macroop isn't passed back in this >> particular case, and with the code the way it is, it's difficult to even >> tell that that's the case, let alone how to fix it. Cleaning things up >> won't fix the problem itself, but it will make fixing the actual problem >> tractable. >> >> Gabe >> >> On 11/13/11 16:16, Ali Saidi wrote: >>> I think this bug is just latently in the code right now and the branch >> predictor change runs into it (this patch causes that branch to be >> mispredicted). In any case I think the issue exists today and it's just >> luck that it works currently. >>> Looking at your list I imagine you should be able to recover most things >> from the dyninst, however I don't know if that is actually the case. >> Excepted that the squashing mechanisms should be cleaned up, I'm not sure >> how that is actually going to solve the problem. Don't we currently send >> back the instruction? With the current instructions can't you figure out >> the macro-op it belongs to? >>> Ali >>> >>> >>> >>> On Nov 13, 2011, at 5:40 PM, Gabe Black wrote: >>> >>>> Hey folks. Ali has had a change out for a while ("Fix several Branch >>>> Predictor issues") which improves branch predictor performance >>>> substantially but breaks X86_FS on O3. It turns out the problem is that >>>> an instruction is started which returns from kernel to user level and is >>>> microcoded. The instruction is fetched from the kernel's address space >>>> successfully and starts to execute, along the way dropping down to user >>>> mode. Some microops later, there's some microop control flow which O3 >>>> mispredicts. When it squashes the mispredict and tries to restart, it >>>> first tries to refetch the instruction involved. Since it's now at user >>>> level and the instruction is on a kernel level only page, there's a page >>>> fault and things go downhill from there. >>>> >>>> I partially implemented a solution to this before where O3 reinstates >>>> the macroop it had been using when it restarts fetch. The problem here >>>> is that the path this kind of squash takes doesn't pass back the right >>>> information, and my attempts to fix that have been unsuccessful. The >>>> code that handles squashing in O3 is too complex, there's too much going >>>> in all directions, it's not always very clear what affect a change will >>>> have in unrelated situations, or which callsites are involved in a >>>> particular type of fault. >>>> >>>> To me, it seems like the first step in fixing this problem is to clean >>>> up how squashes are handled in O3 so that they can be made to >>>> consistently handle squashes in non-restartable macroops. >>>> >>>> Without having really dug into the specifics, I think we only need two >>>> pieces of information when squashing, a pointer to the guilty >>>> instruction and whether execution should start at or after it. It would >>>> start at it if the instruction needed to be reexecuted due to a memory >>>> dependence violation, for instance, and would start after it for faults, >>>> interrupts, or branch mispredicts. Any other information that's needed >>>> like sequence numbers or actual control flow targets can be retrieved >>>> from the instructions where needed without having to split everything >>>> out and pass them around individually. >>>> >>>> Is there any obvious problem with doing things this way? I don't think >>>> I'll personally have a lot of time to dedicate to this at the very least >>>> in the short term, but I wanted to get the conversation going so we know >>>> what to do when somebody has a chance to do it. >>>> >>>> Gabe >>>> _______________________________________________ >>>> gem5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/gem5-dev >>>> >>> _______________________________________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/gem5-dev >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev >> > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
