> On Jan. 6, 2016, 9:39 p.m., Andreas Hansson wrote:
> > I still find that the design needs a far better explanation, and at least
> > one simple example. I would suggest implementing the SwapReq as an atomic
> > op (as I guess it is our only example). Also, surely we need to impose
> > restrictions on the granularity? If we use SwapReq as an example then
> > things would be much more clear.
> >
> > I am also not sure about Joel's comments having been addressed.
>
> Tony Gutierrez wrote:
> Our GPU code has lots of examples for AtomicOps, and the concept is
> pretty simple: use function pointers to perform the atomic RMW op instead of
> adding a bunch of conditions to check all possible SwapReqs. Here is how our
> GPU CAS is implemented, and you can see all AtomicOps in gpu_dyn_inst.hh:
>
> template<typename T>
> class AtomicOpCAS : public TypedAtomicOpFunctor<T>
> {
> public:
> T c;
> T s;
>
> ComputeUnit *computeUnit;
>
> AtomicOpCAS(T _c, T _s, ComputeUnit *compute_unit)
> : c(_c), s(_s), computeUnit(compute_unit) { }
>
> void
> execute(T *b)
> {
> computeUnit->numCASOps++;
>
> if (*b == c) {
> *b = s;
> } else {
> computeUnit->numFailedCASOps++;
> }
>
> if (computeUnit->xact_cas_mode) {
> computeUnit->xactCasLoadMap.clear();
> }
> }
> };
>
> A call to the functor will call execute(). In this case the ISA inst
> objects set the swap value 's' based on one of its src operands, and if the
> current value 'b' hasn't changed from the orignial 'c' (another src operand),
> b gets s.
>
> I think implementing the SwapReq as an AtomicOp would be as simple as
> moving the logic in if (pkt->cmd == MemCmd::SwapReq) in abstract_mem.cc to a
> derived AtomicOpFunctor class and calling it, but I would prefer not to make
> that change part of this patch. I'm not sure how the AtomicOpFunctor would be
> setup by the CPU ISAs here and I don't have the time or the infrastructure to
> test the changes sufficiently to ensure they work properly for the CPU ISAs
> that use this (ARM and SPARC?), so I think it'd be safer to leave it as it is
> for now.
>
> Getting this patch in is necessary for us to push our GPU code, which is
> pretty important to lots of people in the community - at least based on the
> e-mails from users I've been getting - and I would hope this issue wouldn't
> hold that up.
>
> As far as restricting the granularity, I can't imagine a case where one
> would try to perform a RMW on a size greater than a cache line, but to ensure
> it I can add a fatal if the size is greater than a cache block.
I am still trying to come to terms with the overall design choices, and my
feeling is that this opens up for lots of complexity (or at least
non-transparency).
My main hesitation is that this completely departs from the existing gem5
self-sufficient request/packet structure, and by doing so makes wrapping in
TLM2 or SST more or less impossible. The bridging is really important, and so
to justify a change like this we need to carefully consider the pros and cons.
What is the actual data payload of the packet in your example? Does it not make
more sense to at least tie these operations to the request or packet (not just
by pointers), and that way make sure that at least the payload reflects the
current state?
- Andreas
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3205/#review7813
-----------------------------------------------------------
On Jan. 6, 2016, 9:12 p.m., Tony Gutierrez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3205/
> -----------------------------------------------------------
>
> (Updated Jan. 6, 2016, 9:12 p.m.)
>
>
> Review request for Default.
>
>
> Repository: gem5
>
>
> Description
> -------
>
> Changeset 11280:f5e328bd5233
> ---------------------------
> mem: support for gpu-style RMWs in ruby
>
> This patch adds support for GPU-style read-modify-write (RMW) operations in
> ruby. Such atomic operations are traditionally executed at the memory
> controller
> (instead of through an L1 cache using cache-line locking).
>
> Currently, this patch works by propogating operation functors through the
> memory
> system.
>
>
> Diffs
> -----
>
> src/base/types.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/abstract_mem.cc d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/packet.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/protocol/RubySlicc_Exports.sm
> d9a0136ab8cc4b3cf4821d064140b857e60db0dd
> src/mem/request.hh d9a0136ab8cc4b3cf4821d064140b857e60db0dd
>
> Diff: http://reviews.gem5.org/r/3205/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Tony Gutierrez
>
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev