> On Jan. 6, 2016, 1:39 p.m., Andreas Hansson wrote:
> > I still find that the design needs a far better explanation, and at least
> > one simple example. I would suggest implementing the SwapReq as an atomic
> > op (as I guess it is our only example). Also, surely we need to impose
> > restrictions on the granularity? If we use SwapReq as an example then
> > things would be much more clear.
> >
> > I am also not sure about Joel's comments having been addressed.
>
> Tony Gutierrez wrote:
> Our GPU code has lots of examples for AtomicOps, and the concept is
> pretty simple: use function pointers to perform the atomic RMW op instead of
> adding a bunch of conditions to check all possible SwapReqs. Here is how our
> GPU CAS is implemented, and you can see all AtomicOps in gpu_dyn_inst.hh:
>
> template<typename T>
> class AtomicOpCAS : public TypedAtomicOpFunctor<T>
> {
> public:
> T c;
> T s;
>
> ComputeUnit *computeUnit;
>
> AtomicOpCAS(T _c, T _s, ComputeUnit *compute_unit)
> : c(_c), s(_s), computeUnit(compute_unit) { }
>
> void
> execute(T *b)
> {
> computeUnit->numCASOps++;
>
> if (*b == c) {
> *b = s;
> } else {
> computeUnit->numFailedCASOps++;
> }
>
> if (computeUnit->xact_cas_mode) {
> computeUnit->xactCasLoadMap.clear();
> }
> }
> };
>
> A call to the functor will call execute(). In this case the ISA inst
> objects set the swap value 's' based on one of its src operands, and if the
> current value 'b' hasn't changed from the orignial 'c' (another src operand),
> b gets s.
>
> I think implementing the SwapReq as an AtomicOp would be as simple as
> moving the logic in if (pkt->cmd == MemCmd::SwapReq) in abstract_mem.cc to a
> derived AtomicOpFunctor class and calling it, but I would prefer not to make
> that change part of this patch. I'm not sure how the AtomicOpFunctor would be
> setup by the CPU ISAs here and I don't have the time or the infrastructure to
> test the changes sufficiently to ensure they work properly for the CPU ISAs
> that use this (ARM and SPARC?), so I think it'd be safer to leave it as it is
> for now.
>
> Getting this patch in is necessary for us to push our GPU code, which is
> pretty important to lots of people in the community - at least based on the
> e-mails from users I've been getting - and I would hope this issue wouldn't
> hold that up.
>
> As far as restricting the granularity, I can't imagine a case where one
> would try to perform a RMW on a size greater than a cache line, but to ensure
> it I can add a fatal if the size is greater than a cache block.
>
> Andreas Hansson wrote:
> I am still trying to come to terms with the overall design choices, and
> my feeling is that this opens up for lots of complexity (or at least
> non-transparency).
>
> My main hesitation is that this completely departs from the existing gem5
> self-sufficient request/packet structure, and by doing so makes wrapping in
> TLM2 or SST more or less impossible. The bridging is really important, and so
> to justify a change like this we need to carefully consider the pros and cons.
>
> What is the actual data payload of the packet in your example? Does it
> not make more sense to at least tie these operations to the request or packet
> (not just by pointers), and that way make sure that at least the payload
> reflects the current state?
>
> Tony Gutierrez wrote:
> Ah, your issue is clear to me now. The actual payload, unfortunately, IS
> the functor. As I mentioned to Joel previously, these carry the data around
> with them, which is why the original author used functors instead of function
> pointers. This is also something I have been meaning to fix, but haven't had
> time.
>
> If I move all the functors to the request would you be ok with me pushing
> this now? The ultimate goal being that the req would carry a regular function
> pointer to a member method around, and the data payload will be part of the
> packet data/extra data, and will be moved out of the functors.
>
> Joel Hestness wrote:
> Yep, it sounds like Andreas is interested in the same thing as me. The
> thing is, if we changed this patch as Tony offers above, we still need the
> function pointer in the request or packet metadata, so it doesn't really
> address the issue. It's more busy work, so I'd rather just advocate for
> addressing this in a later patch.
>
> I forgot to mention previously that gem5-gpu has code for packing atomics
> into packet payload (line ~355:
> http://gem5-gpu.cs.wisc.edu/repo/gem5-gpu/file/92d1e15704a4/src/gpu/lsq_warp_inst_buffer.cc
> ). The process is a little tricky and uses data size rather than templates
> there, but might be instructive when we get to that.
>
> Andreas Hansson wrote:
> I'd be far more comfortable with these changes, but I would also like the
> people that have contributed the SST and TLM wrappers to have a look. I
> really worry about interoperability here.
>
> Tony Gutierrez wrote:
> We agree with you in principle on the design not being great, and as I
> mentioned this is somethign that was already on my 'nice to have' list, but
> it wasn't high priorty. Now that we understand your conerns better, it has
> become higher priority. That said, I think our code is isolated relatively
> well at this point, so it shoudn't affect anything as of now. The pointer
> will be carried around by the request, but if you're not running GPU code it
> shouldn't matter - unless the current CAS were to be implemented as an
> AtomicOpFunctor.
>
> I can try to (quickly, thus dangerously) fix up this patch before
> shipping, but I'd prefer not to as to avoid breaking any current ISAs.
> Alternatively I can push it as-is, then when I have more time in a few weeks,
> I (hopefully with input from Joel) can try to unify the design properly, and
> move the data out of the functors and into the packets.
>
> Curtis Dunham wrote:
> This type of approach seems entirely incompatible with an external memory
> interface like SST (which I maintain the wrapper for). I agree with the data
> going into the packet instead of the functor, but then you're still left with
> the functor, which is an overly general interface. Out of curiosity, how
> many of these operations would there be? I think the cleanest thing would be
> for them to all be their own MemCmd - would we be talking about just a few
> more commands, or dozens?
>
> Tony Gutierrez wrote:
> I believe there are 22 atomic RMW operations in the HSAIL ISA our GPU
> model executes; clearly this would lead to too many new MemCmd types, and
> increase the complexity of the logic for performing atomic RMWs. I think the
> design of simply specifying the MemCmd as an atomic RMW (SwapReq), and then
> just executing the functor, which would then perform the actual operation, is
> a much cleaner design. And this is doable becaue the atomic RMW ops have the
> same API. I guess I'm not understanding why the use of a general function ptr
> here would cause a problem for the SST interface - which I know next to
> nothing about admittedly.
>
> If you have a function pointer available in the packet, but aren't using
> it at all, would this still break things?
>
> If all current, necessary RMW (only CAS afaik) were to be implemented as
> member methods of request, then the atomic op function pointer would just be
> set to the appropriate member method based on which atomic is actually being
> executed, thereby restoring self sufficiency of the packet/req. Would that
> still break the ineroperability?
>
> I understand why moving data out of the packet breaks things, but I'm
> still not clear as to why using a function pointer to call the actual atomic
> op would break things.
>
> Brad Beckmann wrote:
> Curtis and Andreas, we need to check in this patch. Like Tony said, this
> patch does not break the external memory interface and SST. Yes, it is
> incompatible, but if you don't use the functor, nothing should break. Please
> stop demanding perfection the first time a feature is implemented. We are
> committed to further improving the feature and we will make more of the
> changes suggested by you and Joel. However we to move on from this initial
> patch. This is an open source project and none of us are being paid to
> develop perfect gem5 code.
>
> In general, please keep in mind that our team rarely (if ever) demands
> perfect code from you. We would appreciate if you reciprocated.
>
> Andreas Hansson wrote:
> We are not asking for perfection, we are asking for a design that takes a
> balanced approach in accommodating existing and future use-cases.
>
> I am increasingly convinced that we should just go for encoding of the
> operations in the packet MemCmd, possibly with the addition of a MemSubCmd.
> For example, looking at the ARM atomic operations, they are split into atomic
> compare, atomic swap, atomic load and atomic store. The two latter operations
> are then combined with a sub-operation, like XOR, ADD, MAX, MIN etc. I'd
> imagine this largely overlaps with the operations you are introducing. Don't
> you think? The implementation could be put in a src/mem/atomic_helpers.hh/cc
> or similar and called by the AbstractMemory, or any caches.
>
> Keeping everything in the packet is relly the key concern here, and to
> answer Tony's question, the current bridges rely on a lookup translation of
> the fields on the packet. No functions are being called "through" the bridge.
> I hope that makes things more clear.
>
> Tony Gutierrez wrote:
> That is fine. We're open to changing the design going forward - I was
> already planning on doing so, and it sounds like our respective design
> proposals would be very similar - and yes, I'm sure the design could converge
> for all ISA atomics. Doing so, however, is going to be a fairly significant
> effort. Given that it will take time to converge on the design, then
> implement it, we'd prefer not to have that hold up this patch, thereby
> holding up our GPU code.
>
> I understand the functions are not being called, I'm just curious how
> adding a new field to the packet would break that. I was imagining that a
> gem5 packet would be embedded in some sort of 'packet' SST et al. could
> consume, and SST would know how to interpret gem5's packet fields. In such a
> case why can't you just ignore that function pointer field? Since you're not
> using any atomic ops, it would always be null anyhow. When we eventually
> clean this up, the possible functions the pointer could point to would be
> included in Request.
>
> The only difference between what I was envisioning, and what you're
> describing, is that I'd have all atomic op methods be members of reqeust. The
> request atomicOpFunctor would then point to one of its own members, based on
> which the ISA isntruction sets it to when it creates the request - since the
> atomic op type is encoded in the instruction. Memory/cache controllers would
> then just call atomicOpFunctor. This avoids adding more MemCmds (introducing
> MemSubCmds) and adding lots of if() checks on the MemCmd in the controllers,
> but forces the introduction of the member atomic methods.
>
> I think a good compromise going forward would be the following:
>
> 1) Add some sort of atomic rmw sub cmd and have the instruction set this
> upon req creation.
> 2) Introduce src/mem/atomic_helpers.hh/cc, define all possible atomic
> RMWs here as methods.
> 3) Have a LUT of function pointers - in mem_helpers.hh/cc - that has
> function pointers to each of the atomic helper functions.
> 4) Have the memory controllers index into the atomic helpers LUT - based
> on the sub cmd type - to execute the sub atomic.
>
> This would require only the additon of the sub cmds to the reques/packet.
> Does this seem reasonable to you? If so I think this would be very doable,
> but as a separate, later patch once we get our GPU code in.
>
> Tony Gutierrez wrote:
> Andreas, Curtis: do you have any thoughts on my proposal? Thanks.
>
> Curtis Dunham wrote:
> You are right that the existing use-cases for external memory (e.g. with
> SST) wouldn't be using these atomic ops and so we could just assert that the
> function pointer is null and everything would be fine. The problem we're
> trying to address is not making it impossible to create a mapping so an
> external memory system like SST could deal with it. The fundamental problem
> was the opacity of the function pointer -- if the operations can be
> labeled/enumerated, then that should be sufficient... and I think your
> proposal accomplishes this. One thing I don't understand (and at this point
> it's not a big deal, but a curiosity certainly) is why we would avoid adding
> new MemCmd's but the notion of adding a SubCmd is fine? Obviously if SST did
> it the other way, hypothetically, the mapping would be trivial -- just trying
> to understand the design philosophy. But yes this sounds like a much better
> long term approach than leaving it just as a functor.
>
> The way that you want to have each operation be members of request, I
> don't completely understand this design vs. having say an abstract inner
> class of Request that each atomic op implements. This class could just as
> well provide the SubCmd enum value that it corresponds to. I may be
> misunderstanding your explanation, though. (admittedly this is just "how you
> dispatch your polymorphism", and virtual functions feel more natural to me
> than pointers to member functions)
>
> Tony Gutierrez wrote:
> There are many ways to accomplish what we need, which is why I would
> rather put it off for another patch at a later time where we can more
> thoroughly discuss and vet the design, rather than have me delay this patch
> or hack something together quickly.
>
> The reason I didn't want to add a bunch of MemCmds was too prevent
> changing the packet/req more than necessary - primarily to avoid causing you
> guys issues. I had to go through significant effort to refactor many of our
> patches - to undo such changes - to get them throug the review process
> prevously. So I was anticipating a lot of push back had I done something like
> that in order to remove our use of functors. Andreas was the one who
> mentioned MemSubCmd and I figured if he was ok with it, then it would be
> fine. I don't have an strong opinion either way.
>
> Note that making the atomic ops members of request, while keeping the
> function pointers, was my original intention. My latest proposal is
> essentially what Andreas described verbatim, with the addition of the LUT to
> the atomic_helpers.[hh/cc] in order to avoid if checks to figure out which
> atomic helper to call. Having dervied request types is also another option,
> but again I was trying to avoid perturbing the "self-sufficiency" of the
> request/packet interface.
>
> Andreas Hansson wrote:
> I think the solution you propose sounds ok Tony. I am inclined to suggest
> we don't put it off though, and get the re-work done before committing this,
> at least the broad brush strokes of what you propose in your four items above.
>
> Tony Gutierrez wrote:
> Hi, Andreas and Curtis. We discussed this internally, and while we
> appreciate your sentiment on this – and your suggestion to not put off our
> proposed changes – we must insist on pushing this code as is. So unless you
> strenuously object to this, e.g., this code would break functionality for you
> immediately, we will push this tomorrow with a commitment to implement what I
> proposed.
>
> We don't have time to commit to this now, and maintaining these patches
> in and of itself is a lot of work, and delaying them only creates unnecessary
> work for us, further delaying any progress. And again, I don’t want to break
> other ISAs by hastily implementing the features I propose. I think the
> benefit of getting the GPU model into users hands and in the code base, being
> tested by regressions, is more critical at this point that fixing up the
> atomic op implementation.
>
> Andreas Hansson wrote:
> From my perspective the total effort is constant (do it now or later), so
> better just get it right. I'm not thrilled at the idea of the IOU, but I
> appreciate the desire to move things forward. When can we expect this to be
> addressed?
>
> Also, it would be good if the patch description and comments were updated
> to reflect that this is generic atomic operations (not just GPU RMWs, and not
> just Ruby), and the review discussion outcomes should probably be included
> somehow, i.e. highlighting the issues and the envisioned solution.
Thanks for understanding. I agree the effort to complete the eventual
implementation is constant, however we don't X amount of time to do it now.
Putting off committing these essentially makes the amount of time needed X +
whatever time it takes to keep rebasing/testing patches everytime the base
changes, as well as delays getting important features into the code base.
I can certainly add the description/comments you requested before shipping this
tomorrow. I anticipate I will have sufficient time to properly fix this code
sometime next month.
- Tony
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3205/#review7813
-----------------------------------------------------------
On Jan. 6, 2016, 1:12 p.m., Tony Gutierrez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3205/
> -----------------------------------------------------------
>
> (Updated Jan. 6, 2016, 1:12 p.m.)
>
>
> Review request for Default.
>
>
> Repository: gem5
>
>
> Description
> -------
>
> Changeset 11280:f5e328bd5233
> ---------------------------
> mem: support for gpu-style RMWs in ruby
>
> This patch adds support for GPU-style read-modify-write (RMW) operations in
> ruby. Such atomic operations are traditionally executed at the memory
> controller
> (instead of through an L1 cache using cache-line locking).
>
> Currently, this patch works by propogating operation functors through the
> memory
> system.
>
>
> Diffs
> -----
>
> src/base/types.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/abstract_mem.cc d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/packet.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/protocol/RubySlicc_Exports.sm
> d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/request.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd
>
> Diff: http://reviews.gem5.org/r/3205/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Tony Gutierrez
>
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev