I checked AMD and Intel's processor manuals. Both state that iret is a
serializing instruction, which means that iret will not be executed
speculatively. I would expect even the micro-ops are executed in a
non-speculative fashion.
--
Nilay
On Mon, 14 Nov 2011, Steve Reinhardt wrote:
That would be one solution. It would have some performance cost, but
depending on how often complex non-speculative macro-instructions get
executed, it might not be too bad.
Another question is whether it makes sense to dynamically predict internal
micro-branches with the same predictor we use for macro-instruction
branches. I honestly don't know how our processors do it, but I would not
be surprised if the dynamic predictor only worked on macro-instructions,
and micro-branches had some static hint bit or something like that. That
doesn't directly affect this bug (since you would still need recovery
regardless of how you predicted the micro-branch), but this discussion does
make me wonder if our model is realistic.
Steve
On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote:
Well, I still don't get it. Do out-of-order CPUs speculate on iret? If
iret is to be executed non-speculatively, I would expect micro-ops that
are part of iret are executed non-speculatively.
--
Nilay
On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote:
Thanks for the more detailed explanation... that helped a lot. Sounds to
me like you're on the right track.
Steve
On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]>
wrote:
No, we're not trying to undo anything. An example might help. Lets look
at a dramatically simplified version of iret, the instruction that
returns from an interrupt handler. The microops might do the following.
1. Restore prior privilege level.
2. If we were in kernel level, skip to 4.
3. Restore user level stack.
4. End.
O3 fetches the bytes that go with iret, decodes that to a macroop, and
starts picking microops out of it. Microop 1 is executed and drops to
user level. Now microop 2 is executed, and O3 misspeculates that the
branch is taken (for example). The mispredict is detected, and later
microops in flight are squashed. O3 then attempts to restart where it
should have gone, microop 3.
Now, O3 looks at the PC involved and starts fetching the bytes which
become the macroop which the microops are pulled from. Because microop 1
successfully completed, the CPU is now at user level, but because the
iret is on a kernel page, it can't be accessed. The kernel gets a page
fault.
As I mentioned before, my partially implemented fix is to not only pass
back the PC, but to also pass back the macroop fetch should use instead
of making it refetch memory. The problem is that it's partially
implemented, and the way squashes work in O3 make it really tricky to
implement it properly, or to tell whether or not it's implemented
properly.
Gabe
On 11/13/11 19:21, Steve Reinhardt wrote:
I'd like to understand the issue a little better before commenting on
a
solution.
Gabe, when you say "instruction" in your original description, do you
mean
micro-op?
It seems to me that the fundamental problem is that we're trying to
undo
the effects of a non-speculative micro-op, correct? So the solution
you're
pursuing is that branch mispredictions only roll back to the offending
micro-op, and don't force the entire macro-op containing that micro-op
to
re-execute?
Is this predicted control flow entirely internal to the macro-op? Or
is
this an RFI where we are integrating the control transfer and the
privilege
change? If it is the latter, why does the RFI need to get squashed at
all?
Steve
On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]>
wrote:
Yes, this is an existing bug and the branch predictor just pokes
things
in the right way to expose it. The macroop isn't passed back in this
particular case, and with the code the way it is, it's difficult to
even
tell that that's the case, let alone how to fix it. Cleaning things
up
won't fix the problem itself, but it will make fixing the actual
problem
tractable.
Gabe
On 11/13/11 16:16, Ali Saidi wrote:
I think this bug is just latently in the code right now and the
branch
predictor change runs into it (this patch causes that branch to be
mispredicted). In any case I think the issue exists today and it's
just
luck that it works currently.
Looking at your list I imagine you should be able to recover most
things
from the dyninst, however I don't know if that is actually the case.
Excepted that the squashing mechanisms should be cleaned up, I'm not
sure
how that is actually going to solve the problem. Don't we currently
send
back the instruction? With the current instructions can't you figure
out
the macro-op it belongs to?
Ali
On Nov 13, 2011, at 5:40 PM, Gabe Black wrote:
Hey folks. Ali has had a change out for a while ("Fix several
Branch
Predictor issues") which improves branch predictor performance
substantially but breaks X86_FS on O3. It turns out the problem is
that
an instruction is started which returns from kernel to user level
and
is
microcoded. The instruction is fetched from the kernel's address
space
successfully and starts to execute, along the way dropping down to
user
mode. Some microops later, there's some microop control flow which
O3
mispredicts. When it squashes the mispredict and tries to restart,
it
first tries to refetch the instruction involved. Since it's now at
user
level and the instruction is on a kernel level only page, there's a
page
fault and things go downhill from there.
I partially implemented a solution to this before where O3
reinstates
the macroop it had been using when it restarts fetch. The problem
here
is that the path this kind of squash takes doesn't pass back the
right
information, and my attempts to fix that have been unsuccessful.
The
code that handles squashing in O3 is too complex, there's too much
going
in all directions, it's not always very clear what affect a change
will
have in unrelated situations, or which callsites are involved in a
particular type of fault.
To me, it seems like the first step in fixing this problem is to
clean
up how squashes are handled in O3 so that they can be made to
consistently handle squashes in non-restartable macroops.
Without having really dug into the specifics, I think we only need
two
pieces of information when squashing, a pointer to the guilty
instruction and whether execution should start at or after it. It
would
start at it if the instruction needed to be reexecuted due to a
memory
dependence violation, for instance, and would start after it for
faults,
interrupts, or branch mispredicts. Any other information that's
needed
like sequence numbers or actual control flow targets can be
retrieved
from the instructions where needed without having to split
everything
out and pass them around individually.
Is there any obvious problem with doing things this way? I don't
think
I'll personally have a lot of time to dedicate to this at the very
least
in the short term, but I wanted to get the conversation going so we
know
what to do when somebody has a chance to do it.
Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev