[gem5-dev] Change in gem5/gem5[master]: mem-cache: Implement non-pipelined classical caches

Sakshi Tiwari (Gerrit) Wed, 11 Dec 2019 17:25:16 -0800

Sakshi Tiwari has uploaded this change for review. (https://gem5-review.googlesource.com/c/public/gem5/+/23623 )


Change subject: mem-cache: Implement non-pipelined classical caches
......................................................................

mem-cache: Implement non-pipelined classical caches

In the current implementation of classical caches in gem5, the cache
access delay is implemented in a way where a lower-numbered cache is able
to successfully submit a new request to the next higher-numbered cache
every clock cycle. This is only possible if the cache is effectively fully
pipelined. The only situation in which the cache cannot accept new
requests is when the MSHR queue is full. The proposed changes add support
in gem5 for caches that are not fully pipelined. They make it possible for
the user to specify the minimum number of cycles that should elapse since
the last accepted request in order to accept a new request. A new
configuration option called busy_latency has been added in the
configs/common/Caches.py file for this purpose. The default value of
busy_latency is 0, which retains the current behavior of gem5. We have
verified this experimentally. Specifying larger values enables gem5 to
simulate non-pipelined caches. We have used this feature in a research
project. It might be potentially useful for other users as well.

The following example demonstrates how busy_latency works. Consider a
system with private L2 caches and a shared, single-ported L3 cache. If two
different L2 caches want to send requests to the L3 at the same time then,
only one of the requests is accepted by the L3 cache. At this point, the
L3 cache will block its cpuSidePort and the next request will wait at its
respective L2 cache MSHR until busy_latency cyles have elapsed (assuming
that the MSHR queue of the L3 cache is not full). The L3 cache will send a
retry signal to the ports waiting at the waitingForLayer queue at the Xbar
and the pending requests will be taken from their respective ports in the
FCFS order.

The busy_latency configuration option is only supported for the L2 and L3
cache levels. It is not supported for the L1 cache (i.e., icache and
dcache) because the split packet handling logic seems to be causing issues.

Change-Id: I9a98e266ffad4a1541d9b23d7ff2facb525035ea
Signed-off-by: Sakshi Tiwari <[email protected]>
---
M configs/common/Caches.py
M src/mem/XBar.py
M src/mem/cache/Cache.py
M src/mem/cache/base.cc
M src/mem/cache/base.hh
M src/mem/cache/cache.cc
6 files changed, 128 insertions(+), 7 deletions(-)



diff --git a/configs/common/Caches.py b/configs/common/Caches.py
index f8edc8b..72613cf 100644
--- a/configs/common/Caches.py
+++ b/configs/common/Caches.py
@@ -51,12 +51,17 @@
 # specific instantiations.

 class L1Cache(Cache):
-    assoc = 2
+    assoc = 4
     tag_latency = 2
     data_latency = 2
+    busy_latency = 0   ## L1 caches do not accept values larger than 0.
     response_latency = 2
     mshrs = 4
     tgts_per_mshr = 20
+    if busy_latency != 0:
+        print ("WARNING: L1 caches do not accept values larger than 0."
+               "Setting L1 cache busy_latency to 0.\n")
+        busy_latency = 0

 class L1_ICache(L1Cache):
     is_read_only = True
@@ -68,9 +73,10 @@

 class L2Cache(Cache):
     assoc = 8
-    tag_latency = 20
-    data_latency = 20
-    response_latency = 20
+    tag_latency = 10
+    data_latency = 10
+    busy_latency = 0
+    response_latency = 10
     mshrs = 20
     tgts_per_mshr = 12
     write_buffers = 8
@@ -79,6 +85,7 @@
     assoc = 8
     tag_latency = 50
     data_latency = 50
+    busy_latency = 0
     response_latency = 50
     mshrs = 20
     size = '1kB'
@@ -88,6 +95,7 @@
     assoc = 2
     tag_latency = 2
     data_latency = 2
+    busy_latency = 0
     response_latency = 2
     mshrs = 10
     size = '1kB'
diff --git a/src/mem/XBar.py b/src/mem/XBar.py
index dab961f..630b0da 100644
--- a/src/mem/XBar.py
+++ b/src/mem/XBar.py
@@ -155,6 +155,30 @@
     # to the first level of unified cache.
     point_of_unification = True

+# We use a coherent crossbar to connect multiple masters to the L2
+# caches. Normally this crossbar would be part of the cache itself.
+class L3XBar(CoherentXBar):
+    # 256-bit crossbar by default
+    width = 32
+
+    # Assume that most of this is covered by the cache latencies, with
+    # no more than a single pipeline stage for any packet.
+    frontend_latency = 1
+    forward_latency = 0
+    response_latency = 1
+    snoop_response_latency = 1
+
+    # Use a snoop-filter by default, and set the latency to zero as
+    # the lookup is assumed to overlap with the frontend latency of
+    # the crossbar
+    snoop_filter = SnoopFilter(lookup_latency = 0)
+
+    # This specialisation of the coherent crossbar is to be considered
+    # the point of unification, it connects the dcache and the icache
+    # to the first level of unified cache.
+    point_of_unification = True
+
+
 # One of the key coherent crossbar instances is the system
 # interconnect, tying together the CPU clusters, GPUs, and any I/O
 # coherent masters, and DRAM controllers.
diff --git a/src/mem/cache/Cache.py b/src/mem/cache/Cache.py
index 7a28136..79bd385 100644
--- a/src/mem/cache/Cache.py
+++ b/src/mem/cache/Cache.py
@@ -82,6 +82,7 @@

     tag_latency = Param.Cycles("Tag lookup latency")
     data_latency = Param.Cycles("Data access latency")
+    busy_latency = Param.Cycles(0, "Latency to serve a new request")

response_latency = Param.Cycles("Latency for the return path on amiss");


     warmup_percentage = Param.Percent(0,
diff --git a/src/mem/cache/base.cc b/src/mem/cache/base.cc
index ebfb092..e8246b2 100644
--- a/src/mem/cache/base.cc
+++ b/src/mem/cache/base.cc
@@ -64,6 +64,9 @@
 #include "params/WriteAllocator.hh"
 #include "sim/core.hh"

+class BaseMasterPort;
+class BaseSlavePort;
+
 using namespace std;

 BaseCache::CacheSlavePort::CacheSlavePort(const std::string &_name,
@@ -78,6 +81,7 @@

 BaseCache::BaseCache(const BaseCacheParams *p, unsigned blk_size)
     : ClockedObject(p),
+      cacheUnblockEvent(this),
       cpuSidePort (p->name + ".cpu_side", this, "CpuSidePort"),
       memSidePort(p->name + ".mem_side", this, "MemSidePort"),
       mshrQueue("MSHRs", p->mshrs, 0, p->demand_mshr_reserve), // see below
@@ -94,6 +98,7 @@
       blkSize(blk_size),
       lookupLatency(p->tag_latency),
       dataLatency(p->data_latency),
+      busyLatency(p->busy_latency),
       forwardLatency(p->tag_latency),
       fillLatency(p->data_latency),
       responseLatency(p->response_latency),
@@ -107,6 +112,8 @@
       noTargetMSHR(nullptr),
       missCount(p->max_miss_count),
       addrRanges(p->addr_ranges.begin(), p->addr_ranges.end()),
+      cacheBusy(false),
+      nextCacheAccessTick(0),
       system(p->system),
       stats(*this)
 {
@@ -164,8 +171,25 @@
     DPRINTF(CachePort, "Port is sending retry\n");

     // reset the flag and call retry
-    mustSendRetry = false;
-    sendRetryReq();
+    if (mustSendRetry) {
+        mustSendRetry = false;
+        sendRetryReq();
+    }
+}
+
+void
+BaseCache::processCacheUnblock()
+{
+    DPRINTF(CachePort, "Cache is set to free\n");
+
+    clearCacheBusy();
+    if (cpuSidePort.isMustSendRetry() && !cpuSidePort.isBlocked()) {
+        cpuSidePort.clearMustSendRetry();
+        cpuSidePort.sendRetryReq();
+   }
+    else {
+        DPRINTF(CachePort, "Cache is set to free\n");
+    }
 }

 Addr
@@ -2350,6 +2374,9 @@
         return true;
     } else if (tryTiming(pkt)) {
         cache->recvTimingReq(pkt);
+        if (mustSendRetry) {
+                return false;
+        }
         return true;
     }
     return false;
diff --git a/src/mem/cache/base.hh b/src/mem/cache/base.hh
index cd467c8..e1a4024 100644
--- a/src/mem/cache/base.hh
+++ b/src/mem/cache/base.hh
@@ -258,6 +258,12 @@

         bool isBlocked() const { return blocked; }

+        bool isMustSendRetry() const { return mustSendRetry; }
+
+        void clearMustSendRetry() { mustSendRetry = false; }
+
+        void setMustSendRetry() { mustSendRetry = true; }
+
       protected:

         CacheSlavePort(const std::string &_name, BaseCache *_cache,
@@ -309,6 +315,13 @@

     };

+
+    void processCacheUnblock();
+
+    EventWrapper<BaseCache,
+                 &BaseCache::processCacheUnblock> cacheUnblockEvent;
+
+
     CpuSidePort cpuSidePort;
     MemSidePort memSidePort;

@@ -843,6 +856,12 @@
     const Cycles dataLatency;

     /**
+     * The latency for which the cache is busy servicing a request
+     * and cannot accept a new request.
+     */
+    const Cycles busyLatency;
+
+    /**
      * This is the forward latency of the cache. It occurs when there
      * is a cache miss and a request is forwarded downstream, in
      * particular an outbound miss.
@@ -909,6 +928,12 @@
     const AddrRangeList addrRanges;

   public:
+
+    /* blocking the cache access if it is servicing a request */
+    bool cacheBusy;
+
+    Tick nextCacheAccessTick;
+
     /** System we are currently operating in. */
     System *system;

@@ -1136,6 +1161,34 @@
         schedMemSideSendEvent(time);
     }

+    /** Check if the cache is blocked because it is busy in servicing a
+     * request.
+     */
+    bool isCacheBusy() const
+    {
+        return cacheBusy;
+    }
+
+    /** Sets the cache as busy because it has started servicing a request.
+      * It will be free after cache access latency.
+      */
+    void setCacheBusy()
+    {
+        if (nextCacheAccessTick == curTick()) {
+            cacheBusy = false;
+        }
+        else {
+            cacheBusy = true;
+            schedule(cacheUnblockEvent, nextCacheAccessTick);
+        }
+    }
+
+    void clearCacheBusy()
+    {
+        cacheBusy = false;
+    }
+
+
     /**
      * Returns true if the cache is blocked for accesses.
      */
diff --git a/src/mem/cache/cache.cc b/src/mem/cache/cache.cc
index e7dd5ef..9b9dbbd 100644
--- a/src/mem/cache/cache.cc
+++ b/src/mem/cache/cache.cc
@@ -473,7 +473,15 @@
         return;
     }

-    BaseCache::recvTimingReq(pkt);
+        if (!isCacheBusy() && !cpuSidePort.isBlocked()) {
+                nextCacheAccessTick = clockEdge(busyLatency);
+                setCacheBusy();
+        }
+        else {
+                cpuSidePort.setMustSendRetry();
+                return;
+        }
+        BaseCache::recvTimingReq(pkt);
 }

 PacketPtr

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/23623

To unsubscribe, or for help writing mail filters, visithttps://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: master
Gerrit-Change-Id: I9a98e266ffad4a1541d9b23d7ff2facb525035ea
Gerrit-Change-Number: 23623
Gerrit-PatchSet: 1
Gerrit-Owner: Sakshi Tiwari <[email protected]>
Gerrit-MessageType: newchange
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

[gem5-dev] Change in gem5/gem5[master]: mem-cache: Implement non-pipelined classical caches

Reply via email to