Re: [gem5-dev] Review Request 3205: mem: support for gpu-style RMWs in ruby

Tony Gutierrez Fri, 08 Jan 2016 15:17:48 -0800


> On Jan. 6, 2016, 1:39 p.m., Andreas Hansson wrote:
> > I still find that the design needs a far better explanation, and at least 
> > one simple example. I would suggest implementing the SwapReq as an atomic 
> > op (as I guess it is our only example). Also, surely we need to impose 
> > restrictions on the granularity? If we use SwapReq as an example then 
> > things would be much more clear.
> > 
> > I am also not sure about Joel's comments having been addressed.
> 
> Tony Gutierrez wrote:
>     Our GPU code has lots of examples for AtomicOps, and the concept is 
> pretty simple: use function pointers to perform the atomic RMW op instead of 
> adding a bunch of conditions to check all possible SwapReqs. Here is how our 
> GPU CAS is implemented, and you can see all AtomicOps in gpu_dyn_inst.hh:
>     
>     template<typename T>
>     class AtomicOpCAS : public TypedAtomicOpFunctor<T>
>     {
>       public:
>         T c;
>         T s;
>     
>         ComputeUnit *computeUnit;
>     
>         AtomicOpCAS(T _c, T _s, ComputeUnit *compute_unit)
>           : c(_c), s(_s), computeUnit(compute_unit) { }
>     
>         void
>         execute(T *b)
>         {
>             computeUnit->numCASOps++;
>     
>             if (*b == c) {
>                 *b = s;
>             } else {
>                 computeUnit->numFailedCASOps++;
>             }
>     
>             if (computeUnit->xact_cas_mode) {
>                 computeUnit->xactCasLoadMap.clear();
>             }
>         }
>     };
>     
>     A call to the functor will call execute(). In this case the ISA inst 
> objects set the swap value 's' based on one of its src operands, and if the 
> current value 'b' hasn't changed from the orignial 'c' (another src operand), 
> b gets s.
>     
>     I think implementing the SwapReq as an AtomicOp would be as simple as 
> moving the logic in if (pkt->cmd == MemCmd::SwapReq) in abstract_mem.cc to a 
> derived AtomicOpFunctor class and calling it, but I would prefer not to make 
> that change part of this patch. I'm not sure how the AtomicOpFunctor would be 
> setup by the CPU ISAs here and I don't have the time or the infrastructure to 
> test the changes sufficiently to ensure they work properly for the CPU ISAs 
> that use this (ARM and SPARC?), so I think it'd be safer to leave it as it is 
> for now.
>     
>     Getting this patch in is necessary for us to push our GPU code, which is 
> pretty important to lots of people in the community - at least based on the 
> e-mails from users I've been getting - and I would hope this issue wouldn't 
> hold that up.
>     
>     As far as restricting the granularity, I can't imagine a case where one 
> would try to perform a RMW on a size greater than a cache line, but to ensure 
> it I can add a fatal if the size is greater than a cache block.
> 
> Andreas Hansson wrote:
>     I am still trying to come to terms with the overall design choices, and 
> my feeling is that this opens up for lots of complexity (or at least 
> non-transparency).
>     
>     My main hesitation is that this completely departs from the existing gem5 
> self-sufficient request/packet structure, and by doing so makes wrapping in 
> TLM2 or SST more or less impossible. The bridging is really important, and so 
> to justify a change like this we need to carefully consider the pros and cons.
>     
>     What is the actual data payload of the packet in your example? Does it 
> not make more sense to at least tie these operations to the request or packet 
> (not just by pointers), and that way make sure that at least the payload 
> reflects the current state?
> 
> Tony Gutierrez wrote:
>     Ah, your issue is clear to me now. The actual payload, unfortunately, IS 
> the functor. As I mentioned to Joel previously, these carry the data around 
> with them, which is why the original author used functors instead of function 
> pointers. This is also something I have been meaning to fix, but haven't had 
> time.
>     
>     If I move all the functors to the request would you be ok with me pushing 
> this now? The ultimate goal being that the req would carry a regular function 
> pointer to a member method around, and the data payload will be part of the 
> packet data/extra data, and will be moved out of the functors.
> 
> Joel Hestness wrote:
>     Yep, it sounds like Andreas is interested in the same thing as me. The 
> thing is, if we changed this patch as Tony offers above, we still need the 
> function pointer in the request or packet metadata, so it doesn't really 
> address the issue. It's more busy work, so I'd rather just advocate for 
> addressing this in a later patch.
>     
>     I forgot to mention previously that gem5-gpu has code for packing atomics 
> into packet payload (line ~355: 
> http://gem5-gpu.cs.wisc.edu/repo/gem5-gpu/file/92d1e15704a4/src/gpu/lsq_warp_inst_buffer.cc
>  ). The process is a little tricky and uses data size rather than templates 
> there, but might be instructive when we get to that.
> 
> Andreas Hansson wrote:
>     I'd be far more comfortable with these changes, but I would also like the 
> people that have contributed the SST and TLM wrappers to have a look. I 
> really worry about interoperability here.
> 
> Tony Gutierrez wrote:
>     We agree with you in principle on the design not being great, and as I 
> mentioned this is somethign that was already on my 'nice to have' list, but 
> it wasn't high priorty. Now that we understand your conerns better, it has 
> become higher priority. That said, I think our code is isolated relatively 
> well at this point, so it shoudn't affect anything as of now. The pointer 
> will be carried around by the request, but if you're not running GPU code it 
> shouldn't matter - unless the current CAS were to be implemented as an 
> AtomicOpFunctor.
>     
>     I can try to (quickly, thus dangerously) fix up this patch before 
> shipping, but I'd prefer not to as to avoid breaking any current ISAs. 
> Alternatively I can push it as-is, then when I have more time in a few weeks, 
> I (hopefully with input from Joel) can try to unify the design properly, and 
> move the data out of the functors and into the packets.
> 
> Curtis Dunham wrote:
>     This type of approach seems entirely incompatible with an external memory 
> interface like SST (which I maintain the wrapper for).  I agree with the data 
> going into the packet instead of the functor, but then you're still left with 
> the functor, which is an overly general interface.  Out of curiosity, how 
> many of these operations would there be?  I think the cleanest thing would be 
> for them to all be their own MemCmd - would we be talking about just a few 
> more commands, or dozens?
> 
> Tony Gutierrez wrote:
>     I believe there are 22 atomic RMW operations in the HSAIL ISA our GPU 
> model executes; clearly this would lead to too many new MemCmd types, and 
> increase the complexity of the logic for performing atomic RMWs. I think the 
> design of simply specifying the MemCmd as an atomic RMW (SwapReq), and then 
> just executing the functor, which would then perform the actual operation, is 
> a much cleaner design. And this is doable becaue the atomic RMW ops have the 
> same API. I guess I'm not understanding why the use of a general function ptr 
> here would cause a problem for the SST interface - which I know next to 
> nothing about admittedly.
>     
>     If you have a function pointer available in the packet, but aren't using 
> it at all, would this still break things?
>     
>     If all current, necessary RMW (only CAS afaik) were to be implemented as 
> member methods of request, then the atomic op function pointer would just be 
> set to the appropriate member method based on which atomic is actually being 
> executed, thereby restoring self sufficiency of the packet/req. Would that 
> still break the ineroperability?
>     
>     I understand why moving data out of the packet breaks things, but I'm 
> still not clear as to why using a function pointer to call the actual atomic 
> op would break things.
> 
> Brad Beckmann wrote:
>     Curtis and Andreas, we need to check in this patch.  Like Tony said, this 
> patch does not break the external memory interface and SST.  Yes, it is 
> incompatible, but if you don't use the functor, nothing should break.  Please 
> stop demanding perfection the first time a feature is implemented.  We are 
> committed to further improving the feature and we will make more of the 
> changes suggested by you and Joel.  However we to move on from this initial 
> patch.  This is an open source project and none of us are being paid to 
> develop perfect gem5 code.
>     
>     In general, please keep in mind that our team rarely (if ever) demands 
> perfect code from you.  We would appreciate if you reciprocated.
> 
> Andreas Hansson wrote:
>     We are not asking for perfection, we are asking for a design that takes a 
> balanced approach in accommodating existing and future use-cases.
>     
>     I am increasingly convinced that we should just go for encoding of the 
> operations in the packet MemCmd, possibly with the addition of a MemSubCmd. 
> For example, looking at the ARM atomic operations, they are split into atomic 
> compare, atomic swap, atomic load and atomic store. The two latter operations 
> are then combined with a sub-operation, like XOR, ADD, MAX, MIN etc. I'd 
> imagine this largely overlaps with the operations you are introducing. Don't 
> you think? The implementation could be put in a src/mem/atomic_helpers.hh/cc 
> or similar and called by the AbstractMemory, or any caches.
>     
>     Keeping everything in the packet is relly the key concern here, and to 
> answer Tony's question, the current bridges rely on a lookup translation of 
> the fields on the packet. No functions are being called "through" the bridge. 
> I hope that makes things more clear.
> 
> Tony Gutierrez wrote:
>     That is fine. We're open to changing the design going forward - I was 
> already planning on doing so, and it sounds like our respective design 
> proposals would be very similar - and yes, I'm sure the design could converge 
> for all ISA atomics. Doing so, however, is going to be a fairly significant 
> effort. Given that it will take time to converge on the design, then 
> implement it, we'd prefer not to have that hold up this patch, thereby 
> holding up our GPU code.
>     
>     I understand the functions are not being called, I'm just curious how 
> adding a new field to the packet would break that. I was imagining that a 
> gem5 packet would be embedded in some sort of 'packet' SST et al. could 
> consume, and SST would know how to interpret gem5's packet fields. In such a 
> case why can't you just ignore that function pointer field? Since you're not 
> using any atomic ops, it would always be null anyhow. When we eventually 
> clean this up, the possible functions the pointer could point to would be 
> included in Request.
>     
>     The only difference between what I was envisioning, and what you're 
> describing, is that I'd have all atomic op methods be members of reqeust. The 
> request atomicOpFunctor would then point to one of its own members, based on 
> which the ISA isntruction sets it to when it creates the request - since the 
> atomic op type is encoded in the instruction. Memory/cache controllers would 
> then just call atomicOpFunctor. This avoids adding more MemCmds (introducing 
> MemSubCmds) and adding lots of if() checks on the MemCmd in the controllers, 
> but forces the introduction of the member atomic methods.
>     
>     I think a good compromise going forward would be the following:
>     
>     1) Add some sort of atomic rmw sub cmd and have the instruction set this 
> upon req creation.
>     2) Introduce src/mem/atomic_helpers.hh/cc, define all possible atomic 
> RMWs here as methods.
>     3) Have a LUT of function pointers - in mem_helpers.hh/cc - that has 
> function pointers to each of the atomic helper functions.
>     4) Have the memory controllers index into the atomic helpers LUT - based 
> on the sub cmd type - to execute the sub atomic.
>     
>     This would require only the additon of the sub cmds to the reques/packet. 
> Does this seem reasonable to you? If so I think this would be very doable, 
> but as a separate, later patch once we get our GPU code in.


Andreas, Curtis: do you have any thoughts on my proposal? Thanks.


- Tony


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3205/#review7813
-----------------------------------------------------------


On Jan. 6, 2016, 1:12 p.m., Tony Gutierrez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3205/
> -----------------------------------------------------------
> 
> (Updated Jan. 6, 2016, 1:12 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11280:f5e328bd5233
> ---------------------------
> mem: support for gpu-style RMWs in ruby
> 
> This patch adds support for GPU-style read-modify-write (RMW) operations in
> ruby. Such atomic operations are traditionally executed at the memory 
> controller
> (instead of through an L1 cache using cache-line locking).
> 
> Currently, this patch works by propogating operation functors through the 
> memory
> system.
> 
> 
> Diffs
> -----
> 
>   src/base/types.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/abstract_mem.cc d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/packet.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/protocol/RubySlicc_Exports.sm 
> d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
>   src/mem/request.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd 
> 
> Diff: http://reviews.gem5.org/r/3205/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Tony Gutierrez
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 3205: mem: support for gpu-style RMWs in ruby

Reply via email to