That would be one solution. It would have some performance cost, but depending on how often complex non-speculative macro-instructions get executed, it might not be too bad.
Another question is whether it makes sense to dynamically predict internal micro-branches with the same predictor we use for macro-instruction branches. I honestly don't know how our processors do it, but I would not be surprised if the dynamic predictor only worked on macro-instructions, and micro-branches had some static hint bit or something like that. That doesn't directly affect this bug (since you would still need recovery regardless of how you predicted the micro-branch), but this discussion does make me wonder if our model is realistic. Steve On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote: > Well, I still don't get it. Do out-of-order CPUs speculate on iret? If > iret is to be executed non-speculatively, I would expect micro-ops that > are part of iret are executed non-speculatively. > > -- > Nilay > > On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote: > > Thanks for the more detailed explanation... that helped a lot. Sounds to > > me like you're on the right track. > > > > Steve > > > > On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]> > wrote: > > > >> No, we're not trying to undo anything. An example might help. Lets look > >> at a dramatically simplified version of iret, the instruction that > >> returns from an interrupt handler. The microops might do the following. > >> > >> 1. Restore prior privilege level. > >> 2. If we were in kernel level, skip to 4. > >> 3. Restore user level stack. > >> 4. End. > >> > >> O3 fetches the bytes that go with iret, decodes that to a macroop, and > >> starts picking microops out of it. Microop 1 is executed and drops to > >> user level. Now microop 2 is executed, and O3 misspeculates that the > >> branch is taken (for example). The mispredict is detected, and later > >> microops in flight are squashed. O3 then attempts to restart where it > >> should have gone, microop 3. > >> > >> Now, O3 looks at the PC involved and starts fetching the bytes which > >> become the macroop which the microops are pulled from. Because microop 1 > >> successfully completed, the CPU is now at user level, but because the > >> iret is on a kernel page, it can't be accessed. The kernel gets a page > >> fault. > >> > >> As I mentioned before, my partially implemented fix is to not only pass > >> back the PC, but to also pass back the macroop fetch should use instead > >> of making it refetch memory. The problem is that it's partially > >> implemented, and the way squashes work in O3 make it really tricky to > >> implement it properly, or to tell whether or not it's implemented > >> properly. > >> > >> Gabe > >> > >> > >> On 11/13/11 19:21, Steve Reinhardt wrote: > >> > I'd like to understand the issue a little better before commenting on > >> a > >> > solution. > >> > > >> > Gabe, when you say "instruction" in your original description, do you > >> mean > >> > micro-op? > >> > > >> > It seems to me that the fundamental problem is that we're trying to > >> undo > >> > the effects of a non-speculative micro-op, correct? So the solution > >> you're > >> > pursuing is that branch mispredictions only roll back to the offending > >> > micro-op, and don't force the entire macro-op containing that micro-op > >> to > >> > re-execute? > >> > > >> > Is this predicted control flow entirely internal to the macro-op? Or > >> is > >> > this an RFI where we are integrating the control transfer and the > >> privilege > >> > change? If it is the latter, why does the RFI need to get squashed at > >> all? > >> > > >> > Steve > >> > > >> > On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]> > >> wrote: > >> > > >> >> Yes, this is an existing bug and the branch predictor just pokes > >> things > >> >> in the right way to expose it. The macroop isn't passed back in this > >> >> particular case, and with the code the way it is, it's difficult to > >> even > >> >> tell that that's the case, let alone how to fix it. Cleaning things > >> up > >> >> won't fix the problem itself, but it will make fixing the actual > >> problem > >> >> tractable. > >> >> > >> >> Gabe > >> >> > >> >> On 11/13/11 16:16, Ali Saidi wrote: > >> >>> I think this bug is just latently in the code right now and the > >> branch > >> >> predictor change runs into it (this patch causes that branch to be > >> >> mispredicted). In any case I think the issue exists today and it's > >> just > >> >> luck that it works currently. > >> >>> Looking at your list I imagine you should be able to recover most > >> things > >> >> from the dyninst, however I don't know if that is actually the case. > >> >> Excepted that the squashing mechanisms should be cleaned up, I'm not > >> sure > >> >> how that is actually going to solve the problem. Don't we currently > >> send > >> >> back the instruction? With the current instructions can't you figure > >> out > >> >> the macro-op it belongs to? > >> >>> Ali > >> >>> > >> >>> > >> >>> > >> >>> On Nov 13, 2011, at 5:40 PM, Gabe Black wrote: > >> >>> > >> >>>> Hey folks. Ali has had a change out for a while ("Fix several > >> Branch > >> >>>> Predictor issues") which improves branch predictor performance > >> >>>> substantially but breaks X86_FS on O3. It turns out the problem is > >> that > >> >>>> an instruction is started which returns from kernel to user level > >> and > >> is > >> >>>> microcoded. The instruction is fetched from the kernel's address > >> space > >> >>>> successfully and starts to execute, along the way dropping down to > >> user > >> >>>> mode. Some microops later, there's some microop control flow which > >> O3 > >> >>>> mispredicts. When it squashes the mispredict and tries to restart, > >> it > >> >>>> first tries to refetch the instruction involved. Since it's now at > >> user > >> >>>> level and the instruction is on a kernel level only page, there's a > >> page > >> >>>> fault and things go downhill from there. > >> >>>> > >> >>>> I partially implemented a solution to this before where O3 > >> reinstates > >> >>>> the macroop it had been using when it restarts fetch. The problem > >> here > >> >>>> is that the path this kind of squash takes doesn't pass back the > >> right > >> >>>> information, and my attempts to fix that have been unsuccessful. > >> The > >> >>>> code that handles squashing in O3 is too complex, there's too much > >> going > >> >>>> in all directions, it's not always very clear what affect a change > >> will > >> >>>> have in unrelated situations, or which callsites are involved in a > >> >>>> particular type of fault. > >> >>>> > >> >>>> To me, it seems like the first step in fixing this problem is to > >> clean > >> >>>> up how squashes are handled in O3 so that they can be made to > >> >>>> consistently handle squashes in non-restartable macroops. > >> >>>> > >> >>>> Without having really dug into the specifics, I think we only need > >> two > >> >>>> pieces of information when squashing, a pointer to the guilty > >> >>>> instruction and whether execution should start at or after it. It > >> would > >> >>>> start at it if the instruction needed to be reexecuted due to a > >> memory > >> >>>> dependence violation, for instance, and would start after it for > >> faults, > >> >>>> interrupts, or branch mispredicts. Any other information that's > >> needed > >> >>>> like sequence numbers or actual control flow targets can be > >> retrieved > >> >>>> from the instructions where needed without having to split > >> everything > >> >>>> out and pass them around individually. > >> >>>> > >> >>>> Is there any obvious problem with doing things this way? I don't > >> think > >> >>>> I'll personally have a lot of time to dedicate to this at the very > >> least > >> >>>> in the short term, but I wanted to get the conversation going so we > >> know > >> >>>> what to do when somebody has a chance to do it. > >> >>>> > >> >>>> Gabe > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
