[gem5-dev] changeset in gem5: mem: Add cache clusivity

Andreas Hansson Fri, 06 Nov 2015 00:50:29 -0800

changeset f8fdd931e674 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=f8fdd931e674
description:
        mem: Add cache clusivity


        This patch adds a parameter to control the cache clusivity, that is if
        the cache is mostly inclusive or exclusive. At the moment there is no
        intention to support strict policies, and thus the options are: 1)
        mostly inclusive, or 2) mostly exclusive.

        The choice of policy guides the behaviuor on a cache fill, and a new
        helper function, allocOnFill, is created to encapsulate the decision
        making process. For the timing mode, the decision is annotated on the
        MSHR on sending out the downstream packet, and in atomic we directly
        pass the decision to handleFill. We (ab)use the tempBlock in cases
        where we are not allocating on fill, leaving the rest of the cache
        unaffected. Simple and effective.

        This patch also makes it more explicit that multiple caches are
        allowed to consider a block writable (this is the case
        also before this patch). That is, for a mostly inclusive cache,
        multiple caches upstream may also consider the block exclusive. The
        caches considering the block writable/exclusive all appear along the
        same path to memory, and from a coherency protocol point of view it
        works due to the fact that we always snoop upwards in zero time before
        querying any downstream cache.

        Note that this patch does not introduce clean writebacks. Thus, for
        clean lines we are essentially removing a cache level if it is made
        mostly exclusive. For example, lines from the read-only L1 instruction
        cache or table-walker cache are always clean, and simply get dropped
        rather than being passed to the L2. If the L2 is mostly exclusive and
        does not allocate on fill it will thus never hold the line. A follow
        on patch adds the clean writebacks.

        The patch changes the L2 of the O3_ARM_v7a CPU configuration to be
        mostly exclusive (and stats are affected accordingly).

diffstat:

 configs/common/O3_ARM_v7a.py |    1 +
 src/mem/cache/Cache.py       |   16 ++++++
 src/mem/cache/base.hh        |   12 ++++-
 src/mem/cache/cache.cc       |  114 +++++++++++++++++++++++++++++++++---------
 src/mem/cache/cache.hh       |   71 ++++++++++++++++++++++++++-
 src/mem/cache/mshr.cc        |   14 ++++-
 src/mem/cache/mshr.hh        |    9 ++-
 src/mem/cache/mshr_queue.cc  |    4 +-
 src/mem/cache/mshr_queue.hh  |    3 +-
 9 files changed, 207 insertions(+), 37 deletions(-)

diffs (truncated from 548 to 300 lines):

diff -r 53d4f7e452d6 -r f8fdd931e674 configs/common/O3_ARM_v7a.py
--- a/configs/common/O3_ARM_v7a.py      Fri Nov 06 03:26:40 2015 -0500
+++ b/configs/common/O3_ARM_v7a.py      Fri Nov 06 03:26:41 2015 -0500
@@ -185,6 +185,7 @@
     assoc = 16
     write_buffers = 8
     prefetch_on_access = True
+    clusivity = 'mostly_excl'
     # Simple stride prefetcher
     prefetcher = StridePrefetcher(degree=8, latency = 1)
     tags = RandomRepl()
diff -r 53d4f7e452d6 -r f8fdd931e674 src/mem/cache/Cache.py
--- a/src/mem/cache/Cache.py    Fri Nov 06 03:26:40 2015 -0500
+++ b/src/mem/cache/Cache.py    Fri Nov 06 03:26:41 2015 -0500
@@ -84,6 +84,22 @@
 
     system = Param.System(Parent.any, "System we belong to")
 
+# Enum for cache clusivity, currently mostly inclusive or mostly
+# exclusive.
+class Clusivity(Enum): vals = ['mostly_incl', 'mostly_excl']
+
 class Cache(BaseCache):
     type = 'Cache'
     cxx_header = 'mem/cache/cache.hh'
+
+    # Control whether this cache should be mostly inclusive or mostly
+    # exclusive with respect to upstream caches. The behaviour on a
+    # fill is determined accordingly. For a mostly inclusive cache,
+    # blocks are allocated on all fill operations. Thus, L1 caches
+    # should be set as mostly inclusive even if they have no upstream
+    # caches. In the case of a mostly exclusive cache, fills are not
+    # allocating unless they came directly from a non-caching source,
+    # e.g. a table walker. Additionally, on a hit from an upstream
+    # cache a line is dropped for a mostly exclusive cache.
+    clusivity = Param.Clusivity('mostly_incl',
+                                "Clusivity with upstream cache")
diff -r 53d4f7e452d6 -r f8fdd931e674 src/mem/cache/base.hh
--- a/src/mem/cache/base.hh     Fri Nov 06 03:26:40 2015 -0500
+++ b/src/mem/cache/base.hh     Fri Nov 06 03:26:41 2015 -0500
@@ -210,7 +210,8 @@
         // overlap
         assert(addr == blockAlign(addr));
 
-        MSHR *mshr = mq->allocate(addr, size, pkt, time, order++);
+        MSHR *mshr = mq->allocate(addr, size, pkt, time, order++,
+                                  allocOnFill(pkt->cmd));
 
         if (mq->isFull()) {
             setBlocked((BlockedCause)mq->index);
@@ -234,6 +235,15 @@
     }
 
     /**
+     * Determine if we should allocate on a fill or not.
+     *
+     * @param cmd Packet command being added as an MSHR target
+     *
+     * @return Whether we should allocate on a fill or not
+     */
+    virtual bool allocOnFill(MemCmd cmd) const = 0;
+
+    /**
      * Write back dirty blocks in the cache using functional accesses.
      */
     virtual void memWriteback() = 0;
diff -r 53d4f7e452d6 -r f8fdd931e674 src/mem/cache/cache.cc
--- a/src/mem/cache/cache.cc    Fri Nov 06 03:26:40 2015 -0500
+++ b/src/mem/cache/cache.cc    Fri Nov 06 03:26:41 2015 -0500
@@ -68,7 +68,11 @@
       tags(p->tags),
       prefetcher(p->prefetcher),
       doFastWrites(true),
-      prefetchOnAccess(p->prefetch_on_access)
+      prefetchOnAccess(p->prefetch_on_access),
+      clusivity(p->clusivity),
+      tempBlockWriteback(nullptr),
+      writebackTempBlockAtomicEvent(this, false,
+                                    EventBase::Delayed_Writeback_Pri)
 {
     tempBlock = new CacheBlk();
     tempBlock->data = new uint8_t[blkSize];
@@ -198,10 +202,10 @@
                 if (blk->isDirty()) {
                     pkt->assertMemInhibit();
                 }
-                // on ReadExReq we give up our copy unconditionally
-                if (blk != tempBlock)
-                    tags->invalidate(blk);
-                blk->invalidate();
+                // on ReadExReq we give up our copy unconditionally,
+                // even if this cache is mostly inclusive, we may want
+                // to revisit this
+                invalidateBlock(blk);
             } else if (blk->isWritable() && !pending_downgrade &&
                        !pkt->sharedAsserted() &&
                        pkt->cmd != MemCmd::ReadCleanReq) {
@@ -220,9 +224,30 @@
                     if (!deferred_response) {
                         // if we are responding immediately and can
                         // signal that we're transferring ownership
-                        // along with exclusivity, do so
+                        // (inhibit set) along with exclusivity
+                        // (shared not set), do so
                         pkt->assertMemInhibit();
+
+                        // if this cache is mostly inclusive, we keep
+                        // the block as writable (exclusive), and pass
+                        // it upwards as writable and dirty
+                        // (modified), hence we have multiple caches
+                        // considering the same block writable,
+                        // something that we get away with due to the
+                        // fact that: 1) this cache has been
+                        // considered the ordering points and
+                        // responded to all snoops up till now, and 2)
+                        // we always snoop upwards before consulting
+                        // the local cache, both on a normal request
+                        // (snooping done by the crossbar), and on a
+                        // snoop
                         blk->status &= ~BlkDirty;
+
+                        // if this cache is mostly exclusive with
+                        // respect to the cache above, drop the block
+                        if (clusivity == Enums::mostly_excl) {
+                            invalidateBlock(blk);
+                        }
                     } else {
                         // if we're responding after our own miss,
                         // there's a window where the recipient didn't
@@ -241,9 +266,10 @@
         // Upgrade or Invalidate, since we have it Exclusively (E or
         // M), we ack then invalidate.
         assert(pkt->isUpgrade() || pkt->isInvalidate());
-        assert(blk != tempBlock);
-        tags->invalidate(blk);
-        blk->invalidate();
+
+        // for invalidations we could be looking at the temp block
+        // (for upgrades we always allocate)
+        invalidateBlock(blk);
         DPRINTF(Cache, "%s for %s addr %#llx size %d (invalidation)\n",
                 __func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize());
     }
@@ -761,7 +787,8 @@
                     // buffer and to schedule an event to the queued
                     // port and also takes into account the additional
                     // delay of the xbar.
-                    mshr->allocateTarget(pkt, forward_time, order++);
+                    mshr->allocateTarget(pkt, forward_time, order++,
+                                         allocOnFill(pkt->cmd));
                     if (mshr->getNumTargets() == numTarget) {
                         noTargetMSHR = mshr;
                         setBlocked(Blocked_NoTargets);
@@ -1027,13 +1054,15 @@
 
                     // write-line request to the cache that promoted
                     // the write to a whole line
-                    blk = handleFill(pkt, blk, writebacks);
+                    blk = handleFill(pkt, blk, writebacks,
+                                     allocOnFill(pkt->cmd));
                     satisfyCpuSideRequest(pkt, blk);
                 } else if (bus_pkt->isRead() ||
                            bus_pkt->cmd == MemCmd::UpgradeResp) {
                     // we're updating cache state to allow us to
                     // satisfy the upstream request from the cache
-                    blk = handleFill(bus_pkt, blk, writebacks);
+                    blk = handleFill(bus_pkt, blk, writebacks,
+                                     allocOnFill(pkt->cmd));
                     satisfyCpuSideRequest(pkt, blk);
                 } else {
                     // we're satisfying the upstream request without
@@ -1056,9 +1085,34 @@
     // immediately rather than calling requestMemSideBus() as we do
     // there).
 
-    // Handle writebacks (from the response handling) if needed
+    // do any writebacks resulting from the response handling
     doWritebacksAtomic(writebacks);
 
+    // if we used temp block, check to see if its valid and if so
+    // clear it out, but only do so after the call to recvAtomic is
+    // finished so that any downstream observers (such as a snoop
+    // filter), first see the fill, and only then see the eviction
+    if (blk == tempBlock && tempBlock->isValid()) {
+        // the atomic CPU calls recvAtomic for fetch and load/store
+        // sequentuially, and we may already have a tempBlock
+        // writeback from the fetch that we have not yet sent
+        if (tempBlockWriteback) {
+            // if that is the case, write the prevoius one back, and
+            // do not schedule any new event
+            writebackTempBlockAtomic();
+        } else {
+            // the writeback/clean eviction happens after the call to
+            // recvAtomic has finished (but before any successive
+            // calls), so that the response handling from the fill is
+            // allowed to happen first
+            schedule(writebackTempBlockAtomicEvent, curTick());
+        }
+
+        tempBlockWriteback = blk->isDirty() ? writebackBlk(blk) :
+            cleanEvictBlk(blk);
+        blk->invalidate();
+    }
+
     if (pkt->needsResponse()) {
         pkt->makeAtomicResponse();
     }
@@ -1214,7 +1268,7 @@
         DPRINTF(Cache, "Block for addr %#llx being updated in Cache\n",
                 pkt->getAddr());
 
-        blk = handleFill(pkt, blk, writebacks);
+        blk = handleFill(pkt, blk, writebacks, mshr->allocOnFill);
         assert(blk != NULL);
     }
 
@@ -1258,7 +1312,7 @@
                 // deferred targets if possible
                 mshr->promoteExclusive();
                 // NB: we use the original packet here and not the response!
-                blk = handleFill(tgt_pkt, blk, writebacks);
+                blk = handleFill(tgt_pkt, blk, writebacks, mshr->allocOnFill);
                 assert(blk != NULL);
 
                 // treat as a fill, and discard the invalidation
@@ -1362,9 +1416,7 @@
         // should not invalidate the block, so check if the
         // invalidation should be discarded
         if (is_invalidate || mshr->hasPostInvalidate()) {
-            assert(blk != tempBlock);
-            tags->invalidate(blk);
-            blk->invalidate();
+            invalidateBlock(blk);
         } else if (mshr->hasPostDowngrade()) {
             blk->status &= ~BlkWritable;
         }
@@ -1588,6 +1640,13 @@
     return blk;
 }
 
+void
+Cache::invalidateBlock(CacheBlk *blk)
+{
+    if (blk != tempBlock)
+        tags->invalidate(blk);
+    blk->invalidate();
+}
 
 // Note that the reason we return a list of writebacks rather than
 // inserting them directly in the write buffer is that this function
@@ -1595,7 +1654,8 @@
 // mode we don't mess with the write buffer (we just perform the
 // writebacks atomically once the original request is complete).
 CacheBlk*
-Cache::handleFill(PacketPtr pkt, CacheBlk *blk, PacketList &writebacks)
+Cache::handleFill(PacketPtr pkt, CacheBlk *blk, PacketList &writebacks,
+                  bool allocate)
 {
     assert(pkt->isResponse() || pkt->cmd == MemCmd::WriteLineReq);
     Addr addr = pkt->getAddr();
@@ -1619,11 +1679,14 @@
         // happens in the subsequent satisfyCpuSideRequest.
         assert(pkt->isRead() || pkt->cmd == MemCmd::WriteLineReq);
 
-        // need to do a replacement
-        blk = allocateBlock(addr, is_secure, writebacks);
+        // need to do a replacement if allocating, otherwise we stick
+        // with the temporary storage
+        blk = allocate ? allocateBlock(addr, is_secure, writebacks) : NULL;
+
         if (blk == NULL) {
-            // No replaceable block... just use temporary storage to
-            // complete the current request and then get rid of it
+            // No replaceable block or a mostly exclusive
+            // cache... just use temporary storage to complete the
+            // current request and then get rid of it
             assert(!tempBlock->isValid());
             blk = tempBlock;
             tempBlock->set = tags->extractSet(addr);
@@ -1877,6 +1940,7 @@
         // applies both to reads and writes and that for writes it
         // works thanks to the fact that we still have dirty data and
         // will write it back at a later point
+        assert(!pkt->memInhibitAsserted());
         pkt->assertMemInhibit();
         if (have_exclusive) {
             // in the case of an uncacheable request there is no point
@@ -1911,9 +1975,7 @@
     // Do this last in case it deallocates block data or something
     // like that
     if (invalidate) {
-        if (blk != tempBlock)
-            tags->invalidate(blk);
-        blk->invalidate();
+        invalidateBlock(blk);
     }
 
     DPRINTF(Cache, "new state is %s\n", blk->print());
diff -r 53d4f7e452d6 -r f8fdd931e674 src/mem/cache/cache.hh
--- a/src/mem/cache/cache.hh    Fri Nov 06 03:26:40 2015 -0500
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

[gem5-dev] changeset in gem5: mem: Add cache clusivity

Reply via email to