Hi Nilay, I apologize it has taken me a few days to respond. I need to read my gem5-dev email more often.
First off, I just want to be clear that we are only discussing locked prefixed RMW instructions, correct? Non-locked RMW are not an issue. In my opinion, the absolute best source to understand the x86 memory model is Sewell et al. http://doi.acm.org/10.1145/1785414.1785443 In the paper, they explain when processors can logically execute locked prefixed instructions in a very clear and intuitive way. As Steve said, locked prefixed instructions act as fences, but they also immediately retire to the memory system to maintain global ordering. Thus the locked prefixed instruction cannot logically complete until all prior lds and sts from that processor have been retired to the memory system. In other words, the load and store buffers must be empty. Furthermore, the locked prefixed instruction must immediately become visible when the locked prefixed instruction retires. In other words, the store buffer cannot hold on to the store value after the core retires the instruction. I think the main question here is how does the O3 ld/st queue respond to the serialize before, serialize after, and fence flags? Essentially, we need to use the combination of flags that flushes the ld and st buffers before logically executing the load portion of the locked RMW, as well as bypasses the store buffer when executing the store portion of the locked RMW. There are certainly optimizations that can be implemented to maintain that logical behavior, while allowing the hardware to do more parallel execution. However, I would suggest not trying to implement those before getting the core functionality to work using existing mechanisms. On a related note, have you thought about how you're going to propagate Ruby probes back to the O3 load buffer? Assuming a snooping load queue, that is one core mechanism that we need to implement to support X86+O3+Ruby. It might be useful for us to discuss different possible interface implementations before you spend too much time writing code. Brad > -----Original Message----- > From: [email protected] [mailto:gem5-dev- > [email protected]] On Behalf Of Steve Reinhardt > Sent: Thursday, October 27, 2011 10:09 AM > To: gem5 Developer List > Subject: Re: [gem5-dev] Locked RMW in Ruby > > Hi Nilay, > > I think a memory barrier may not be sufficient... we need to make sure it's > non-speculative as well as ordered (unless we do something more > complicated to deal with a speculative locked read that isn't followed by a > write because it got squashed). > > Gabe is a better reference (the only reference?) for the details of the x86 > decoder. > > Steve > > On Thu, Oct 27, 2011 at 8:32 AM, Nilay Vaish <[email protected]> wrote: > > > I am thinking of marking all the locked instructions with IsMemBarrier. > > Where do you think this flag should appear - in locked_opcodes.isa, or > > in semaphores.py? I tried adding IsMemBarrier to the instructions in > > locked_opcodes.isa, but that does not work. I changed the instruction > > format to BasicOperate, that also does not work. > > > > -- > > Nilay > > > > > > On Wed, 26 Oct 2011, Gabe Black wrote: > > > > I think you guys are on the right track. There's a non speculative > > flag, > >> a serialize before, and a serialize after. I'm not sure which one is > >> exactly right, but some combination should be. We should be careful > >> not to over do it since that might artificially hurt performance, but > >> I don't *think* the lock prefix is used all that much these days so > >> it shouldn't be have *too* bad an impact if it isn't perfectly correct. > >> > >> Gabe > >> > >> On 10/26/11 09:56, Nilay Vaish wrote: > >> > >>> On Tue, 25 Oct 2011, Steve Reinhardt wrote: > >>> > >>> Good questions. Clearly if we ever let the R part of an RMW > >>>> instruction out > >>>> to the cache, either we have to commit the instruction or add some > >>>> mechanism to unlock the block. One solution would be to mark all > >>>> RMW instructions as serializing, which would prevent them from > >>>> executing speculatively. > >>>> That > >>>> (or something like it) might be necessary to get the consistency > >>>> model right anyway, since I believe locked accesses act as fences > >>>> (?? is that right, Brad?). > >>>> > >>>> Gabe, did you have an alternate solution in mind? > >>>> > >>>> Steve > >>>> > >>>> On Tue, Oct 25, 2011 at 2:15 PM, Nilay Vaish <[email protected]> > wrote: > >>>> > >>>> Does this mean that an x86 O3 CPU will never squash an RMW > >>>>> instruction? I > >>>>> am posting an instruction + protocol trace for obtained from O3 > >>>>> and Ruby. In the first portion, you can see that the O3 CPU issues > >>>>> a locked RMW with the read part having sn = 3051 and the write > >>>>> part having sn = 3052. In the second portion, you can see that > >>>>> 3051 and 3052 are squashed and the in the third portion of the > >>>>> trace, these are committed. There are several things that I am not > >>>>> able to understand. Why is the RMW squashed, since x86 > >>>>> architecture has to commit the instruction? Secondly, if RMW was > >>>>> being executed speculatively, then what mechanism exists for > >>>>> informing the cache controller about the instruction getting > >>>>> squashed? Thirdly, why was the instruction committed later on, > >>>>> when it was originally squashed? > >>>>> > >>>>> > >>> When I mark ldstl and stul as non-speculative, the O3 CPU and Ruby > >>> work on an example code in which two threads are incrementing a > >>> counter. Since locked RMW is a fence instruction (Steve suggested > >>> this above and AMD's manual agrees), it seems that the read portion > >>> should commit any of the loads and stores that appear before it in > >>> the program order. This means that ldstl should be marked as memory > >>> barrrier, and similarly stul should also be marked as memory barrier. > >>> But looking at src/arch/x86/isa/microops/**ldstop.isa, it does not > >>> seem like that this flags can be currently supported. If others > >>> (especially Steve and Gabe) concurr with my understanding, I can > >>> modify the file to add the memory barrier flag. > >>> > >>> -- > >>> Nilay > >>> ______________________________**_________________ > >>> gem5-dev mailing list > >>> [email protected] > >>> http://m5sim.org/mailman/**listinfo/gem5- > dev<http://m5sim.org/mailma > >>> n/listinfo/gem5-dev> > >>> > >> > >> ______________________________**_________________ > >> gem5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/**listinfo/gem5- > dev<http://m5sim.org/mailman > >> /listinfo/gem5-dev> > >> > >> ______________________________**_________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/**listinfo/gem5- > dev<http://m5sim.org/mailman/ > > listinfo/gem5-dev> > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
