> On Jan. 6, 2016, 9:39 p.m., Andreas Hansson wrote:
> > I still find that the design needs a far better explanation, and at least 
> > one simple example. I would suggest implementing the SwapReq as an atomic 
> > op (as I guess it is our only example). Also, surely we need to impose 
> > restrictions on the granularity? If we use SwapReq as an example then 
> > things would be much more clear.
> > 
> > I am also not sure about Joel's comments having been addressed.
> 
> Tony Gutierrez wrote:
>     Our GPU code has lots of examples for AtomicOps, and the concept is 
> pretty simple: use function pointers to perform the atomic RMW op instead of 
> adding a bunch of conditions to check all possible SwapReqs. Here is how our 
> GPU CAS is implemented, and you can see all AtomicOps in gpu_dyn_inst.hh:
>     
>     template<typename T>
>     class AtomicOpCAS : public TypedAtomicOpFunctor<T>
>     {
>       public:
>         T c;
>         T s;
>     
>         ComputeUnit *computeUnit;
>     
>         AtomicOpCAS(T _c, T _s, ComputeUnit *compute_unit)
>           : c(_c), s(_s), computeUnit(compute_unit) { }
>     
>         void
>         execute(T *b)
>         {
>             computeUnit->numCASOps++;
>     
>             if (*b == c) {
>                 *b = s;
>             } else {
>                 computeUnit->numFailedCASOps++;
>             }
>     
>             if (computeUnit->xact_cas_mode) {
>                 computeUnit->xactCasLoadMap.clear();
>             }
>         }
>     };
>     
>     A call to the functor will call execute(). In this case the ISA inst 
> objects set the swap value 's' based on one of its src operands, and if the 
> current value 'b' hasn't changed from the orignial 'c' (another src operand), 
> b gets s.
>     
>     I think implementing the SwapReq as an AtomicOp would be as simple as 
> moving the logic in if (pkt->cmd == MemCmd::SwapReq) in abstract_mem.cc to a 
> derived AtomicOpFunctor class and calling it, but I would prefer not to make 
> that change part of this patch. I'm not sure how the AtomicOpFunctor would be 
> setup by the CPU ISAs here and I don't have the time or the infrastructure to 
> test the changes sufficiently to ensure they work properly for the CPU ISAs 
> that use this (ARM and SPARC?), so I think it'd be safer to leave it as it is 
> for now.
>     
>     Getting this patch in is necessary for us to push our GPU code, which is 
> pretty important to lots of people in the community - at least based on the 
> e-mails from users I've been getting - and I would hope this issue wouldn't 
> hold that up.
>     
>     As far as restricting the granularity, I can't imagine a case where one 
> would try to perform a RMW on a size greater than a cache line, but to ensure 
> it I can add a fatal if the size is greater than a cache block.
> 
> Andreas Hansson wrote:
>     I am still trying to come to terms with the overall design choices, and 
> my feeling is that this opens up for lots of complexity (or at least 
> non-transparency).
>     
>     My main hesitation is that this completely departs from the existing gem5 
> self-sufficient request/packet structure, and by doing so makes wrapping in 
> TLM2 or SST more or less impossible. The bridging is really important, and so 
> to justify a change like this we need to carefully consider the pros and cons.
>     
>     What is the actual data payload of the packet in your example? Does it 
> not make more sense to at least tie these operations to the request or packet 
> (not just by pointers), and that way make sure that at least the payload 
> reflects the current state?
> 
> Tony Gutierrez wrote:
>     Ah, your issue is clear to me now. The actual payload, unfortunately, IS 
> the functor. As I mentioned to Joel previously, these carry the data around 
> with them, which is why the original author used functors instead of function 
> pointers. This is also something I have been meaning to fix, but haven't had 
> time.
>     
>     If I move all the functors to the request would you be ok with me pushing 
> this now? The ultimate goal being that the req would carry a regular function 
> pointer to a member method around, and the data payload will be part of the 
> packet data/extra data, and will be moved out of the functors.
> 
> Joel Hestness wrote:
>     Yep, it sounds like Andreas is interested in the same thing as me. The 
> thing is, if we changed this patch as Tony offers above, we still need the 
> function pointer in the request or packet metadata, so it doesn't really 
> address the issue. It's more busy work, so I'd rather just advocate for 
> addressing this in a later patch.
>     
>     I forgot to mention previously that gem5-gpu has code for packing atomics 
> into packet payload (line ~355: 
> http://gem5-gpu.cs.wisc.edu/repo/gem5-gpu/file/92d1e15704a4/src/gpu/lsq_warp_inst_buffer.cc
>  ). The process is a little tricky and uses data size rather than templates 
> there, but might be instructive when we get to that.

I'd be far more comfortable with these changes, but I would also like the 
people that have contributed the SST and TLM wrappers to have a look. I really 
worry about interoperability here.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3205/#review7813
-----------------------------------------------------------


On Jan. 6, 2016, 9:12 p.m., Tony Gutierrez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3205/
> -----------------------------------------------------------
> 
> (Updated Jan. 6, 2016, 9:12 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11280:f5e328bd5233
> ---------------------------
> mem: support for gpu-style RMWs in ruby
> 
> This patch adds support for GPU-style read-modify-write (RMW) operations in
> ruby. Such atomic operations are traditionally executed at the memory 
> controller
> (instead of through an L1 cache using cache-line locking).
> 
> Currently, this patch works by propogating operation functors through the 
> memory
> system.
> 
> 
> Diffs
> -----
> 
>   src/base/types.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/abstract_mem.cc d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/packet.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/protocol/RubySlicc_Exports.sm 
> d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/request.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
> 
> Diff: http://reviews.gem5.org/r/3205/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Tony Gutierrez
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to