Re: [gem5-dev] squashing bug in O3

Nilay Vaish Mon, 14 Nov 2011 15:20:26 -0800

I checked AMD and Intel's processor manuals. Both state that iret is aserializing instruction, which means that iret will not be executedspeculatively. I would expect even the micro-ops are executed in anon-speculative fashion.


--
Nilay


On Mon, 14 Nov 2011, Steve Reinhardt wrote:

That would be one solution.  It would have some performance cost, but
depending on how often complex non-speculative macro-instructions get
executed, it might not be too bad.

Another question is whether it makes sense to dynamically predict internal
micro-branches with the same predictor we use for macro-instruction
branches.  I honestly don't know how our processors do it, but I would not
be surprised if the dynamic predictor only worked on macro-instructions,
and micro-branches had some static hint bit or something like that.  That
doesn't directly affect this bug (since you would still need recovery
regardless of how you predicted the micro-branch), but this discussion does
make me wonder if our model is realistic.

Steve

On Sun, Nov 13, 2011 at 10:54 PM, Nilay <[email protected]> wrote:

Well, I still don't get it. Do out-of-order CPUs speculate on iret? If
iret is to be executed non-speculatively, I would expect micro-ops that
are part of iret are executed non-speculatively.

--
Nilay

On Sun, November 13, 2011 11:14 pm, Steve Reinhardt wrote:

Thanks for the more detailed explanation... that helped a lot.  Sounds to
me like you're on the right track.

Steve

On Sun, Nov 13, 2011 at 8:20 PM, Gabe Black <[email protected]>

wrote:

No, we're not trying to undo anything. An example might help. Lets look
at a dramatically simplified version of iret, the instruction that
returns from an interrupt handler. The microops might do the following.

1. Restore prior privilege level.
2. If we were in kernel level, skip to 4.
3. Restore user level stack.
4. End.

O3 fetches the bytes that go with iret, decodes that to a macroop, and
starts picking microops out of it. Microop 1 is executed and drops to
user level. Now microop 2 is executed, and O3 misspeculates that the
branch is taken (for example). The mispredict is detected, and later
microops in flight are squashed. O3 then attempts to restart where it
should have gone, microop 3.

Now, O3 looks at the PC involved and starts fetching the bytes which
become the macroop which the microops are pulled from. Because microop 1
successfully completed, the CPU is now at user level, but because the
iret is on a kernel page, it can't be accessed. The kernel gets a page
fault.

As I mentioned before, my partially implemented fix is to not only pass
back the PC, but to also pass back the macroop fetch should use instead
of making it refetch memory. The problem is that it's partially
implemented, and the way squashes work in O3 make it really tricky to
implement it properly, or to tell whether or not it's implemented
properly.

Gabe


On 11/13/11 19:21, Steve Reinhardt wrote:

I'd like to understand the issue a little better before commenting on

solution.

Gabe, when you say "instruction" in your original description, do you

mean

micro-op?

It seems to me that the fundamental problem is that we're trying to

undo

the effects of a non-speculative micro-op, correct?  So the solution

you're

pursuing is that branch mispredictions only roll back to the offending
micro-op, and don't force the entire macro-op containing that micro-op

to

re-execute?

Is this predicted control flow entirely internal to the macro-op?  Or

is

this an RFI where we are integrating the control transfer and the

privilege

change?  If it is the latter, why does the RFI need to get squashed at

all?


Steve

On Sun, Nov 13, 2011 at 4:34 PM, Gabe Black <[email protected]>

wrote:

Yes, this is an existing bug and the branch predictor just pokes

things

in the right way to expose it. The macroop isn't passed back in this
particular case, and with the code the way it is, it's difficult to

even

tell that that's the case, let alone how to fix it. Cleaning things

up

won't fix the problem itself, but it will make fixing the actual

problem

tractable.

Gabe

On 11/13/11 16:16, Ali Saidi wrote:

I think this bug is just latently in the code right now and the

branch

predictor change runs into it (this patch causes that branch to be
mispredicted). In any case I think the issue exists today and it's

just

luck that it works currently.

Looking at your list I imagine you should be able to recover most

things

from the dyninst, however I don't know if that is actually the case.
Excepted that the squashing mechanisms should be cleaned up, I'm not

sure

how that is actually going to solve the problem. Don't we currently

send

back the instruction? With the current instructions can't you figure

out

the macro-op it belongs to?

Ali



On Nov 13, 2011, at 5:40 PM, Gabe Black wrote:

Hey folks. Ali has had a change out for a while ("Fix several

Branch

Predictor issues") which improves branch predictor performance
substantially but breaks X86_FS on O3. It turns out the problem is

that

an instruction is started which returns from kernel to user level

and
is

microcoded. The instruction is fetched from the kernel's address

space

successfully and starts to execute, along the way dropping down to

user

mode. Some microops later, there's some microop control flow which

O3

mispredicts. When it squashes the mispredict and tries to restart,

it

first tries to refetch the instruction involved. Since it's now at

user

level and the instruction is on a kernel level only page, there's a

page

fault and things go downhill from there.

I partially implemented a solution to this before where O3

reinstates

the macroop it had been using when it restarts fetch. The problem

here

is that the path this kind of squash takes doesn't pass back the

right

information, and my attempts to fix that have been unsuccessful.

The

code that handles squashing in O3 is too complex, there's too much

going

in all directions, it's not always very clear what affect a change

will

have in unrelated situations, or which callsites are involved in a
particular type of fault.

To me, it seems like the first step in fixing this problem is to

clean

up how squashes are handled in O3 so that they can be made to
consistently handle squashes in non-restartable macroops.

Without having really dug into the specifics, I think we only need

two

pieces of information when squashing, a pointer to the guilty
instruction and whether execution should start at or after it. It

would

start at it if the instruction needed to be reexecuted due to a

memory

dependence violation, for instance, and would start after it for

faults,

interrupts, or branch mispredicts. Any other information that's

needed

like sequence numbers or actual control flow targets can be

retrieved

from the instructions where needed without having to split

everything

out and pass them around individually.

Is there any obvious problem with doing things this way? I don't

think

I'll personally have a lot of time to dedicate to this at the very

least

in the short term, but I wanted to get the conversation going so we

know

what to do when somebody has a chance to do it.

Gabe



_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] squashing bug in O3

Reply via email to