Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

Nikos Nikoleris Mon, 09 Jan 2017 02:21:02 -0800


> On Jan. 5, 2017, 4:32 p.m., Tony Gutierrez wrote:
> > Is there anything holding this up from being shipped?
> 
> Andreas Hansson wrote:
>     In my view the patch needs two things:
>     
>     1) Some thought around the design. I am still hoping there is a less 
> invasive way of accommodating the functionality, possibly with some changes 
> to the existing functions of the cache.
>     
>     2) A way to test it. This should preferably include both synthetic and 
> real use-cases. The memtester and memchecker may be a good starting point for 
> the synthetic part.
>     
>     It would be great if someone could dig into these issues.
> 
> Steve Reinhardt wrote:
>     It's been quite a while since I've looked at this... in fact I was going 
> to say that there might be a memory leak, but looking back at the patch 
> history I see I fixed that already in May 2015 :).
>     
>     It's pretty easy to come up with a simple test case, just using C++11 
> atomic ops or pthread mutexes. Unfortunately the ones I had were on my AMD 
> work system that I had to give back, and it looks like I missed them when I 
> went to copy off useful non-proprietary stuff before I left. I know I didn't 
> spend much time writing tests though, so they should be easy to recreate.
>     
>     As far as being less invasive, I think my reply to Andreas's comment of 
> May 7, 2015 is still relevant: given the current structure of the cache code, 
> I think the existing solution is really not that invasive. There isn't really 
> that much code being added here; most of it is comments. Generating a 
> synthetic request to allow allocating an MSHR so that we can accumulate the 
> deferred operations that happen while the lock is held is a little strange, 
> but it allows us to reuse the bulk of the code in recvTimingResp() with just 
> a few changes.
>     
>     If the cache code were heavily restructured, we could probably layer it 
> in such a way that the fake packets would not be necessary, and thus the 
> invasiveness of the locked RMW support could be reduced. However, I started 
> down this path a long time ago (shortly after his comment), and after putting 
> a fair amount of work into the restructuring, I still didn't even have the 
> code working at all, so I gave up. Thus I think the net invasiveness of 
> restructuring the cache to reduce the amount of change required specifically 
> for this feature is much much larger than the invasiveness of this patch by 
> itself. The end result would probably be better code, but it's a huge amount 
> of work that I don't see anyone signing up for right now, so I think it's 
> better to put this change in as is for now, and put "restructuring the cache" 
> on the to-do list.
> 
> Nikos Nikoleris wrote:
>     I agree, the implementation of locked instructions wouldn't be trivial 
> with the current structure of the cache. It would be nice if we didn't have 
> to abuse the MSHR in that way, it makes it hard to reason about memory leaks, 
> and it can't solve the problem for atomic and functional accesses.
>     
>     Wouldn't it be easier to implement this in the same way as far atomics? 
> If I understand correctly the semantics of locked instructions, it would be 
> significantly simpler if we could perform these operations with a single RMW 
> packet rather than two separate, a read and then a write. Wouldn't the 
> AtomicOpFunctor in the packet class be sufficient?
>     
>     Tony would that work for you?
> 
> Steve Reinhardt wrote:
>     Thanks for the input, Nikos. While it would be a lot simpler to do the 
> atomic ops in the cache using functors, x86 lets you put a LOCK prefix in 
> front of a fairly long list of instructions, and then you have to combine 
> that with all the different data sizes that each instruction supports. Plus 
> it would be a pretty bi disruption to the whole CPU ISA definition to create 
> and plumb those packets through to the memory interface, and it would make 
> the locked instructions copmletely different from their unlocked 
> counterparts. These are all the reasons we didn't do it this way to begin 
> with, though in retrospect it's not as infeasible as perhaps it seemed up 
> front; just complicated and a little ugly. Not necessarily worse than the 
> current patch.
>     
>     Actually though there is no problem with atomic and functional 
> accesses... it's easy to make them atomic, as you just issue the read and the 
> write within the same event tick/process() method (i.e., without going back 
> to the event queue). It's only timing accesses that are hard with this 
> interface.


I thought that the AtomicOpFunctor solves exactly this issue. While decoding 
the instruction you pass a function pointer which operates on the data in the 
memory system. Does the memory system need to be aware of the specifics of the 
operation? If I understand correctly the cache will only call the function 
pointer and it won't need to decode the instruction or know anything about it 
for that matter. I am not really familiar with the LOCK prefix but do we need 
to support something more than arithmetic operations?

It would be great if this was not necessarily tied to the x86 LOCK prefix but a 
generic mechanism to support far atomics across different architectures.


- Nikos


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2691/#review9228
-----------------------------------------------------------


On April 15, 2016, 5:42 a.m., Steve Reinhardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2691/
> -----------------------------------------------------------
> 
> (Updated April 15, 2016, 5:42 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11444:8a1419dbbfa6
> ---------------------------
> mem: implement x86 locked accesses in timing-mode classic cache
> 
> Add LockedRMW(Read|Write)(Req|Resp) commands.  In timing mode,
> use a combination of clearing permission bits and leaving
> an MSHR in place to prevent accesses & snoops from touching
> a locked block between the read and write parts of an locked
> RMW sequence.
> 
> 
> Diffs
> -----
> 
>   src/mem/cache/cache.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/cache/mshr.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.hh df24b9af42c72606f1fa8e5aa0502b53e81ea176 
>   src/mem/packet.cc df24b9af42c72606f1fa8e5aa0502b53e81ea176 
> 
> Diff: http://reviews.gem5.org/r/2691/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Steve Reinhardt
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2691: mem: implement x86 locked accesses in timing-mode classic cache

Reply via email to