On Thu, 27 Oct 2011, Beckmann, Brad wrote:
Hi Nilay,
I apologize it has taken me a few days to respond. I need to read my
gem5-dev email more often.
First off, I just want to be clear that we are only discussing locked
prefixed RMW instructions, correct? Non-locked RMW are not an issue.
Right, Ruby does not lock the address in case of non-locked RMW.
In my opinion, the absolute best source to understand the x86 memory
model is Sewell et al. http://doi.acm.org/10.1145/1785414.1785443 In
the paper, they explain when processors can logically execute locked
prefixed instructions in a very clear and intuitive way. As Steve said,
locked prefixed instructions act as fences, but they also immediately
retire to the memory system to maintain global ordering. Thus the
locked prefixed instruction cannot logically complete until all prior
lds and sts from that processor have been retired to the memory system.
In other words, the load and store buffers must be empty. Furthermore,
the locked prefixed instruction must immediately become visible when the
locked prefixed instruction retires. In other words, the store buffer
cannot hold on to the store value after the core retires the
instruction.
I am assuming that if an instruction is marked as a memory barrier, the O3
CPU will drain the load and store buffers before and after the
instruction.
I think the main question here is how does the O3 ld/st queue respond to
the serialize before, serialize after, and fence flags? Essentially, we
need to use the combination of flags that flushes the ld and st buffers
before logically executing the load portion of the locked RMW, as well
as bypasses the store buffer when executing the store portion of the
locked RMW. There are certainly optimizations that can be implemented
to maintain that logical behavior, while allowing the hardware to do
more parallel execution. However, I would suggest not trying to
implement those before getting the core functionality to work using
existing mechanisms.
I am in agreement with you.
On a related note, have you thought about how you're going to propagate
Ruby probes back to the O3 load buffer? Assuming a snooping load queue,
that is one core mechanism that we need to implement to support
X86+O3+Ruby. It might be useful for us to discuss different possible
interface implementations before you spend too much time writing code.
Brad
I have a patch for this available on review board. This is the link --
http://reviews.gem5.org/r/894/
--
Nilay
-----Original Message-----
From: [email protected] [mailto:gem5-dev-
[email protected]] On Behalf Of Steve Reinhardt
Sent: Thursday, October 27, 2011 10:09 AM
To: gem5 Developer List
Subject: Re: [gem5-dev] Locked RMW in Ruby
Hi Nilay,
I think a memory barrier may not be sufficient... we need to make sure it's
non-speculative as well as ordered (unless we do something more
complicated to deal with a speculative locked read that isn't followed by a
write because it got squashed).
Gabe is a better reference (the only reference?) for the details of the x86
decoder.
Steve
On Thu, Oct 27, 2011 at 8:32 AM, Nilay Vaish <[email protected]> wrote:
I am thinking of marking all the locked instructions with IsMemBarrier.
Where do you think this flag should appear - in locked_opcodes.isa, or
in semaphores.py? I tried adding IsMemBarrier to the instructions in
locked_opcodes.isa, but that does not work. I changed the instruction
format to BasicOperate, that also does not work.
--
Nilay
On Wed, 26 Oct 2011, Gabe Black wrote:
I think you guys are on the right track. There's a non speculative
flag,
a serialize before, and a serialize after. I'm not sure which one is
exactly right, but some combination should be. We should be careful
not to over do it since that might artificially hurt performance, but
I don't *think* the lock prefix is used all that much these days so
it shouldn't be have *too* bad an impact if it isn't perfectly correct.
Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev