[gem5-dev] changeset in gem5: mem: Add clean evicts to improve snoop filter...

Ali Jafri Fri, 03 Jul 2015 07:16:58 -0700

changeset 9294c4a60251 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=9294c4a60251
description:
        mem: Add clean evicts to improve snoop filter tracking


        This patch adds eviction notices to the caches, to provide accurate
        tracking of cache blocks in snoop filters. We add the CleanEvict
        message to the memory heirarchy and use both CleanEvicts and
        Writebacks with BLOCK_CACHED flags to propagate notice of clean and
        dirty evictions respectively, down the memory hierarchy. Note that the
        BLOCK_CACHED flag indicates whether there exist any copies of the
        evicted block in the caches above the evicting cache.

        The purpose of the CleanEvict message is to notify snoop filters of
        silent evictions in the relevant caches. The CleanEvict message
        behaves much like a Writeback. CleanEvict is a write and a request but
        unlike a Writeback, CleanEvict does not have data and does not need
        exclusive access to the block. The cache generates the CleanEvict
        message on a fill resulting in eviction of a clean block. Before
        travelling downwards CleanEvict requests generate zero-time snoop
        requests to check if the same block is cached in upper levels of the
        memory heirarchy. If the block exists, the cache discards the
        CleanEvict message. The snoops check the tags, writeback queue and the
        MSHRs of upper level caches in a manner similar to snoops generated
        from HardPFReqs. Currently CleanEvicts keep travelling towards main
        memory unless they encounter the block corresponding to their address
        or reach main memory (since we have no well defined point of
        serialisation). Main memory simply discards CleanEvict messages.

        We have modified the behavior of Writebacks, such that they generate
        snoops to check for the presence of blocks in upper level caches. It
        is possible in our current implmentation for a lower level cache to be
        writing back a block while a shared copy of the same block exists in
        the upper level cache. If the snoops find the same block in upper
        level caches, we set the BLOCK_CACHED flag in the Writeback message.

        We have also added logic to account for interaction of other message
        types with CleanEvicts waiting in the writeback queue. A simple
        example is of a response arriving at a cache removing any CleanEvicts
        to the same address from the cache's writeback queue.

diffstat:

 src/mem/abstract_mem.cc        |   12 +-
 src/mem/cache/cache.hh         |   18 +
 src/mem/cache/cache_impl.hh    |  426 +++++++++++++++++++++++++++++-----------
 src/mem/cache/prefetch/base.cc |    1 +
 src/mem/coherent_xbar.cc       |   19 +
 src/mem/coherent_xbar.hh       |    7 +
 src/mem/dram_ctrl.cc           |    7 +-
 src/mem/packet.cc              |    2 +
 src/mem/packet.hh              |   23 ++
 src/mem/snoop_filter.cc        |   12 +-
 10 files changed, 401 insertions(+), 126 deletions(-)

diffs (truncated from 852 to 300 lines):

diff -r 3e84b8b49c77 -r 9294c4a60251 src/mem/abstract_mem.cc
--- a/src/mem/abstract_mem.cc   Fri Jul 03 10:14:36 2015 -0400
+++ b/src/mem/abstract_mem.cc   Fri Jul 03 10:14:37 2015 -0400
@@ -322,15 +322,21 @@
 void
 AbstractMemory::access(PacketPtr pkt)
 {
-    assert(AddrRange(pkt->getAddr(),
-                     pkt->getAddr() + pkt->getSize() - 1).isSubset(range));
-
     if (pkt->memInhibitAsserted()) {
         DPRINTF(MemoryAccess, "mem inhibited on 0x%x: not responding\n",
                 pkt->getAddr());
         return;
     }
 
+    if (pkt->cmd == MemCmd::CleanEvict) {
+        DPRINTF(MemoryAccess, "CleanEvict  on 0x%x: not responding\n",
+                pkt->getAddr());
+      return;
+    }
+
+    assert(AddrRange(pkt->getAddr(),
+                     pkt->getAddr() + (pkt->getSize() - 1)).isSubset(range));
+
     uint8_t *hostAddr = pmemAddr + pkt->getAddr() - range.start();
 
     if (pkt->cmd == MemCmd::SwapReq) {
diff -r 3e84b8b49c77 -r 9294c4a60251 src/mem/cache/cache.hh
--- a/src/mem/cache/cache.hh    Fri Jul 03 10:14:36 2015 -0400
+++ b/src/mem/cache/cache.hh    Fri Jul 03 10:14:37 2015 -0400
@@ -246,6 +246,11 @@
     bool recvTimingReq(PacketPtr pkt);
 
     /**
+     * Insert writebacks into the write buffer
+     */
+    void doWritebacks(PacketList& writebacks, Tick forward_time);
+
+    /**
      * Handles a response (cache line fill/write ack) from the bus.
      * @param pkt The response packet
      */
@@ -308,6 +313,13 @@
      */
     PacketPtr writebackBlk(CacheBlk *blk);
 
+    /**
+     * Create a CleanEvict request for the given block.
+     * @param blk The block to evict.
+     * @return The CleanEvict request for the block.
+     */
+    PacketPtr cleanEvictBlk(CacheBlk *blk);
+
 
     void memWriteback();
     void memInvalidate();
@@ -359,6 +371,12 @@
     MSHR *getNextMSHR();
 
     /**
+     * Send up a snoop request and find cached copies. If cached copies are
+     * found, set the BLOCK_CACHED flag in pkt.
+     */
+    bool isCachedAbove(const PacketPtr pkt) const;
+
+    /**
      * Selects an outstanding request to service.  Called when the
      * cache gets granted the downstream bus in timing mode.
      * @return The request to service, NULL if none found.
diff -r 3e84b8b49c77 -r 9294c4a60251 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh       Fri Jul 03 10:14:36 2015 -0400
+++ b/src/mem/cache/cache_impl.hh       Fri Jul 03 10:14:37 2015 -0400
@@ -334,6 +334,36 @@
             pkt->getAddr(), pkt->getSize(), pkt->isSecure() ? "s" : "ns",
             blk ? "hit " + blk->print() : "miss");
 
+
+    if (pkt->evictingBlock()) {
+        // We check for presence of block in above caches before issuing
+        // Writeback or CleanEvict to write buffer. Therefore the only
+        // possible cases can be of a CleanEvict packet coming from above
+        // encountering a Writeback generated in this cache peer cache and
+        // waiting in the write buffer. Cases of upper level peer caches
+        // generating CleanEvict and Writeback or simply CleanEvict and
+        // CleanEvict almost simultaneously will be caught by snoops sent out
+        // by crossbar.
+        std::vector<MSHR *> outgoing;
+        if (writeBuffer.findMatches(pkt->getAddr(), pkt->isSecure(),
+                                   outgoing)) {
+            assert(outgoing.size() == 1);
+            PacketPtr wbPkt = outgoing[0]->getTarget()->pkt;
+            assert(pkt->cmd == MemCmd::CleanEvict &&
+                   wbPkt->cmd == MemCmd::Writeback);
+            // As the CleanEvict is coming from above, it would have snooped
+            // into other peer caches of the same level while traversing the
+            // crossbar. If a copy of the block had been found, the CleanEvict
+            // would have been deleted in the crossbar. Now that the
+            // CleanEvict is here we can be sure none of the other upper level
+            // caches connected to this cache have the block, so we can clear
+            // the BLOCK_CACHED flag in the Writeback if set and discard the
+            // CleanEvict by returning true.
+            wbPkt->clearBlockCached();
+            return true;
+        }
+    }
+
     // Writeback handling is special case.  We can write the block into
     // the cache without having a writeable copy (or any copy at all).
     if (pkt->cmd == MemCmd::Writeback) {
@@ -363,6 +393,19 @@
         DPRINTF(Cache, "%s new state is %s\n", __func__, blk->print());
         incHitCount(pkt);
         return true;
+    } else if (pkt->cmd == MemCmd::CleanEvict) {
+        if (blk != NULL) {
+            // Found the block in the tags, need to stop CleanEvict from
+            // propagating further down the hierarchy. Returning true will
+            // treat the CleanEvict like a satisfied write request and delete
+            // it.
+            return true;
+        }
+        // We didn't find the block here, propagate the CleanEvict further
+        // down the memory hierarchy. Returning false will treat the CleanEvict
+        // like a Writeback which could not find a replaceable block so has to
+        // go to next level.
+        return false;
     } else if ((blk != NULL) &&
                (pkt->needsExclusive() ? blk->isWritable()
                                       : blk->isReadable())) {
@@ -395,6 +438,41 @@
 };
 
 void
+Cache::doWritebacks(PacketList& writebacks, Tick forward_time)
+{
+    while (!writebacks.empty()) {
+        PacketPtr wbPkt = writebacks.front();
+        // We use forwardLatency here because we are copying writebacks to
+        // write buffer.  Call isCachedAbove for both Writebacks and
+        // CleanEvicts. If isCachedAbove returns true we set BLOCK_CACHED flag
+        // in Writebacks and discard CleanEvicts.
+        if (isCachedAbove(wbPkt)) {
+            if (wbPkt->cmd == MemCmd::CleanEvict) {
+                // Delete CleanEvict because cached copies exist above. The
+                // packet destructor will delete the request object because
+                // this is a non-snoop request packet which does not require a
+                // response.
+                delete wbPkt;
+            } else {
+                // Set BLOCK_CACHED flag in Writeback and send below, so that
+                // the Writeback does not reset the bit corresponding to this
+                // address in the snoop filter below.
+                wbPkt->setBlockCached();
+                allocateWriteBuffer(wbPkt, forward_time, true);
+            }
+        } else {
+            // If the block is not cached above, send packet below. Both
+            // CleanEvict and Writeback with BLOCK_CACHED flag cleared will
+            // reset the bit corresponding to this address in the snoop filter
+            // below.
+            allocateWriteBuffer(wbPkt, forward_time, true);
+        }
+        writebacks.pop_front();
+    }
+}
+
+
+void
 Cache::recvTimingSnoopResp(PacketPtr pkt)
 {
     DPRINTF(Cache, "%s for %s addr %#llx size %d\n", __func__,
@@ -510,7 +588,7 @@
 
         /// @todo nominally we should just delete the packet here,
         /// however, until 4-phase stuff we can't because sending
-        /// cache is still relying on it
+        /// cache is still relying on it.
         pendingDelete.push_back(pkt);
 
         // no need to take any action in this particular cache as the
@@ -537,13 +615,7 @@
 
         // copy writebacks to write buffer here to ensure they logically
         // proceed anything happening below
-        while (!writebacks.empty()) {
-            PacketPtr wbPkt = writebacks.front();
-            // We use forwardLatency here because we are copying
-            // writebacks to write buffer.
-            allocateWriteBuffer(wbPkt, forward_time, true);
-            writebacks.pop_front();
-        }
+        doWritebacks(writebacks, forward_time);
     }
 
     // Here we charge the headerDelay that takes into account the latencies
@@ -591,8 +663,10 @@
             cpuSidePort->schedTimingResp(pkt, request_time);
         } else {
             /// @todo nominally we should just delete the packet here,
-            /// however, until 4-phase stuff we can't because sending
-            /// cache is still relying on it
+            /// however, until 4-phase stuff we can't because sending cache is
+            /// still relying on it. If the block is found in access(),
+            /// CleanEvict and Writeback messages will be deleted here as
+            /// well.
             pendingDelete.push_back(pkt);
         }
     } else {
@@ -660,31 +734,38 @@
 
             // Coalesce unless it was a software prefetch (see above).
             if (pkt) {
-                DPRINTF(Cache, "%s coalescing MSHR for %s addr %#llx size 
%d\n",
-                        __func__, pkt->cmdString(), pkt->getAddr(),
-                        pkt->getSize());
+                assert(pkt->cmd != MemCmd::Writeback);
+                // CleanEvicts corresponding to blocks which have outstanding
+                // requests in MSHRs can be deleted here.
+                if (pkt->cmd == MemCmd::CleanEvict) {
+                    pendingDelete.push_back(pkt);
+                } else {
+                    DPRINTF(Cache, "%s coalescing MSHR for %s addr %#llx size 
%d\n",
+                            __func__, pkt->cmdString(), pkt->getAddr(),
+                            pkt->getSize());
 
-                assert(pkt->req->masterId() < system->maxMasters());
-                mshr_hits[pkt->cmdToIndex()][pkt->req->masterId()]++;
-                if (mshr->threadNum != 0/*pkt->req->threadId()*/) {
-                    mshr->threadNum = -1;
+                    assert(pkt->req->masterId() < system->maxMasters());
+                    mshr_hits[pkt->cmdToIndex()][pkt->req->masterId()]++;
+                    if (mshr->threadNum != 0/*pkt->req->threadId()*/) {
+                        mshr->threadNum = -1;
+                    }
+                    // We use forward_time here because it is the same
+                    // considering new targets. We have multiple
+                    // requests for the same address here. It
+                    // specifies the latency to allocate an internal
+                    // buffer and to schedule an event to the queued
+                    // port and also takes into account the additional
+                    // delay of the xbar.
+                    mshr->allocateTarget(pkt, forward_time, order++);
+                    if (mshr->getNumTargets() == numTarget) {
+                        noTargetMSHR = mshr;
+                        setBlocked(Blocked_NoTargets);
+                        // need to be careful with this... if this mshr isn't
+                        // ready yet (i.e. time > curTick()), we don't want to
+                        // move it ahead of mshrs that are ready
+                        // mshrQueue.moveToFront(mshr);
+                    }
                 }
-                // We use forward_time here because it is the same
-                // considering new targets. We have multiple requests for the
-                // same address here. It specifies the latency to allocate an
-                // internal buffer and to schedule an event to the queued
-                // port and also takes into account the additional delay of
-                // the xbar.
-                mshr->allocateTarget(pkt, forward_time, order++);
-                if (mshr->getNumTargets() == numTarget) {
-                    noTargetMSHR = mshr;
-                    setBlocked(Blocked_NoTargets);
-                    // need to be careful with this... if this mshr isn't
-                    // ready yet (i.e. time > curTick()), we don't want to
-                    // move it ahead of mshrs that are ready
-                    // mshrQueue.moveToFront(mshr);
-                }
-
                 // We should call the prefetcher reguardless if the request is
                 // satisfied or not, reguardless if the request is in the MSHR 
or
                 // not.  The request could be a ReadReq hit, but still not
@@ -707,7 +788,7 @@
                 mshr_misses[pkt->cmdToIndex()][pkt->req->masterId()]++;
             }
 
-            if (pkt->cmd == MemCmd::Writeback ||
+            if (pkt->evictingBlock() ||
                 (pkt->req->isUncacheable() && pkt->isWrite())) {
                 // We use forward_time here because there is an
                 // uncached memory write, forwarded to WriteBuffer. It
@@ -782,7 +863,8 @@
     }
 
     if (!blkValid &&
-        (cpu_pkt->cmd == MemCmd::Writeback || cpu_pkt->isUpgrade())) {
+        (cpu_pkt->isUpgrade() ||
+         cpu_pkt->evictingBlock())) {
         // Writebacks that weren't allocated in access() and upgrades
         // from upper-level caches that missed completely just go
         // through.
@@ -834,8 +916,9 @@
     assert(pkt->getAddr() == blockAlign(pkt->getAddr()));
 
     pkt->allocate();
-    DPRINTF(Cache, "%s created %s addr %#llx size %d\n",
-            __func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize());
+    DPRINTF(Cache, "%s created %s from %s for  addr %#llx size %d\n",
+            __func__, pkt->cmdString(), cpu_pkt->cmdString(), pkt->getAddr(),
+            pkt->getSize());
     return pkt;
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

[gem5-dev] changeset in gem5: mem: Add clean evicts to improve snoop filter...

Reply via email to