[gem5-dev] Change in gem5/gem5[develop]: tests,configs: Updates to gpu protocol tester

Bradford Beckmann (Gerrit) via gem5-dev Tue, 08 Sep 2020 08:33:57 -0700

Bradford Beckmann has uploaded this change for review. (https://gem5-review.googlesource.com/c/public/gem5/+/34196 )


Change subject: tests,configs: Updates to gpu protocol tester
......................................................................

tests,configs: Updates to gpu protocol tester

This patch renames the VIPER protocol tester to a more generic
GPU protocol tester name and adds detailed descriptions to the
command line options.

Change-Id: Ia1a0c16302740d60bf2100c82f72c1acf1d39609
---
M configs/example/ruby_gpu_random_test.py
D configs/example/viper_ruby_test.py
2 files changed, 246 insertions(+), 437 deletions(-)

diff --git a/configs/example/ruby_gpu_random_test.pyb/configs/example/ruby_gpu_random_test.py

index d40a942..d32a201 100644
--- a/configs/example/ruby_gpu_random_test.py
+++ b/configs/example/ruby_gpu_random_test.py
@@ -1,4 +1,5 @@
-# Copyright (c) 2010-2015 Advanced Micro Devices, Inc.
+#
+# Copyright (c) 2018 Advanced Micro Devices, Inc.
 # All rights reserved.
 #
 # For use for simulation and test purposes only
@@ -43,7 +44,7 @@
 from common import Options
 from ruby import Ruby

-# Get paths we might need.

+# Get paths we might need. It's expected this file is inm5/configs/example.

 config_path = os.path.dirname(os.path.abspath(__file__))
 config_root = os.path.dirname(config_path)
 m5_root = os.path.dirname(config_root)
@@ -51,25 +52,53 @@
 parser = optparse.OptionParser()
 Options.addNoISAOptions(parser)

-parser.add_option("--maxloads", metavar="N", default=100,
-                  help="Stop after N loads")
-parser.add_option("-f", "--wakeup_freq", metavar="N", default=10,
-                  help="Wakeup every N cycles")
-parser.add_option("-u", "--num-compute-units", type="int", default=1,
-                  help="number of compute units in the GPU")
-parser.add_option("--num-cp", type="int", default=0,
-                  help="Number of GPU Command Processors (CP)")

-# not super important now, but to avoid putting the number 4 everywhere,make

-# it an option/knob

-parser.add_option("--cu-per-sqc", type="int", default=4, help="number ofCUs \

-                  sharing an SQC (icache, and thus icache TLB)")

-parser.add_option("--simds-per-cu", type="int", default=4, help="SIMDunits" \

-                  "per CU")
-parser.add_option("--wf-size", type="int", default=64,
-                  help="Wavefront size(in workitems)")

-parser.add_option("--wfs-per-simd", type="int", default=10, help="Numberof " \

-                  "WF slots per SIMD")
+# GPU Ruby tester options
+parser.add_option("--cache-size", type="int", default=0,

+ help="Cache sizes to use. Small encourages races between\+ requests and writebacks. Large stresseswrite-through \

+                        and/or write-back GPU caches. Range [0..1]")
+parser.add_option("--system-size", type="int", default=0,
+                  help="This option defines how many CUs, CPUs and cache \
+                        components in the test system. Range[0..2]")
+parser.add_option("--address-range", type="int", default=0,
+                  help="This option defines the number of atomic \
+                        locations that affects the working set's size. \
+                        A small number of atomic locations encourage more \

+ races among threads. The large option stressescache \

+                        resources. Range [0..1]")
+parser.add_option("--episode-length", type="int", default=0,
+                  help="This option defines the number of LDs and \

+ STs in an episode. The small option encouragesraces \

+                        between the start and end of an episode. The long \

+ option encourages races between LDs and STs in the\

+                        same episode. Range [0..2]")
+parser.add_option("--test-length", type="int", default=1,
+                  help="The number of episodes to be executed by each \

+ wavefront. This determines the maximum number,i.e., \+ val X #WFs, of episodes to be executed in thetest.")

+parser.add_option("--debug-tester", action='store_true',
+                  help="This option will turn on DRF checker")
+parser.add_option("--random-seed", type="int", default=0,
+                  help="Random seed number. Default value (i.e., 0) means \
+                        using runtime-specific value")
+parser.add_option("--log-file", type="string", default="gpu-ruby-test.log")

+# GPU configurations

+parser.add_option("--wf-size", type="int", default=64, help="wavefrontsize")

+
+parser.add_option("-w", "--wavefronts-per-cu", type="int", default=1,
+                  help="Number of wavefronts per cu")
+
+parser.add_option("--cu-per-sqc", type="int", default=4,
+                  help="number of CUs sharing an SQC")
+
+parser.add_option("--cu-per-scalar-cache", type="int", default=4,
+                  help="number of CUs sharing an scalar cache")
+
+parser.add_option("--cu-per-sa", type="int", default=4,
+                  help="number of CUs per shader array \
+                        This must be a multiple of options.cu-per-sqc and \
+                        options.cu-per-scalar")
 #
 # Add the ruby specific and protocol specific options
 #
@@ -94,86 +123,232 @@
 options.l2_assoc=2
 options.l3_assoc=2

-# This file can support multiple compute units
-assert(options.num_compute_units >= 1)
-n_cu = options.num_compute_units
+#
+# Set up cache size - 2 options
+#   0: small cache
+#   1: large cache
+#
+if (options.cache_size == 0):
+    options.tcp_size="256B"
+    options.tcp_assoc=2
+    options.tcc_size="1kB"
+    options.tcc_assoc=2
+elif (options.cache_size == 1):
+    options.tcp_size="256kB"
+    options.tcp_assoc=16
+    options.tcc_size="1024kB"
+    options.tcc_assoc=16
+else:

+ print("Error: option cache_size '%s' not recognized",options.cache_size)

+     sys.exit(1)

-options.num_sqc = int((n_cu + options.cu_per_sqc - 1) //options.cu_per_sqc)

+#
+# Set up system size - 3 options
+#
+if (options.system_size == 0):
+    # 1 CU, 1 CPU, 1 SQC, 1 Scalar
+    options.wf_size = 1
+    options.wavefronts_per_cu = 1
+    options.num_cpus = 1
+    options.cu_per_sqc = 1
+    options.cu_per_scalar_cache = 1
+    options.num_compute_units = 1
+elif (options.system_size == 1):
+    # 4 CUs, 4 CPUs, 1 SQCs, 1 Scalars
+    options.wf_size = 16
+    options.wavefronts_per_cu = 4
+    options.num_cpus = 4
+    options.cu_per_sqc = 4
+    options.cu_per_scalar_cache = 4
+    options.num_compute_units = 4
+elif (options.system_size == 2):
+    # 8 CUs, 4 CPUs, 1 SQCs, 1 Scalars
+    options.wf_size = 32
+    options.wavefronts_per_cu = 4
+    options.num_cpus = 4
+    options.cu_per_sqc = 4
+    options.cu_per_scalar_cache = 4
+    options.num_compute_units = 8
+else:

+ print("Error: option system size '%s' not recognized",options.system_size)

+    sys.exit(1)
+
+#
+# set address range - 2 options
+#   level 0: small
+#   level 1: large
+# each location corresponds to a 4-byte piece of data
+#
+options.mem_size = '1024MB'
+num_atomic_locs = 10
+num_regular_locs_per_atomic_loc = 10000
+if (options.address_range == 1):
+    num_atomic_locs = 100
+    num_regular_locs_per_atomic_loc = 100000
+elif (options.address_range != 0):
+    print("Error: option address_range '%s' not recognized", \
+              options.address_range)
+    sys.exit(1)
+
+#
+# set episode length (# of actions per episode) - 3 options
+#   0: 10 actions
+#   1: 100 actions
+#   2: 500 actions
+#
+eps_length = 10
+if (options.episode_length == 1):
+    eps_length = 100
+elif (options.episode_length == 2):
+    eps_length = 500
+elif (options.episode_length != 0):
+    print("Error: option episode_length '%s' not recognized",
+              options.episode_length)
+    sys.exit(1)
+
+# set the Ruby's and tester's deadlock thresholds
+# the Ruby's deadlock detection is the primary check for deadlock.

+# the tester's deadlock threshold detection is a secondary check fordeadlock

+# if there is a bug in RubyPort that causes a packet not to return to the
+# tester properly, the tester will throw a deadlock exception.
+# we set cache_deadlock_threshold < tester_deadlock_threshold to detect
+# deadlock caused by Ruby protocol first before one caused by the coalescer
+options.cache_deadlock_threshold = 100000000
+tester_deadlock_threshold = 1000000000
+
+# for now, we're testing only GPU protocol, so we set num_cpus to 0
+options.num_cpus = 0
+# number of CPUs and CUs
+n_CPUs = options.num_cpus
+n_CUs = options.num_compute_units
+# set test length, i.e., number of episodes per wavefront * #WFs
+# test length can be 1x#WFs, 10x#WFs, 100x#WFs, ...
+n_WFs = n_CUs * options.wavefronts_per_cu
+max_episodes = options.test_length * n_WFs
+# number of SQC and Scalar caches
+assert(n_CUs % options.cu_per_sqc == 0)
+n_SQCs = int(n_CUs/options.cu_per_sqc)
+options.num_sqc = n_SQCs
+assert(options.cu_per_scalar_cache != 0)
+n_Scalars = int(n_CUs/options.cu_per_scalar_cache)
+options.num_scalar_cache = n_Scalars

 if args:
      print("Error: script doesn't take any positional arguments")
      sys.exit(1)

 #
-# Create the ruby random tester
+# Create GPU Ruby random tester
 #
-

-# Check to for the GPU_RfO protocol. Other GPU protocols are non-SC andwill

-# not work with the Ruby random tester.
-assert(buildEnv['PROTOCOL'] == 'GPU_RfO')
-
-# The GPU_RfO protocol does not support cache flushes
-check_flush = False
-
-tester = RubyTester(check_flush=check_flush,
-                    checks_to_complete=options.maxloads,
-                    wakeup_frequency=options.wakeup_freq,
-                    deadlock_threshold=1000000)
+tester = ProtocolTester(cus_per_sqc = options.cu_per_sqc,
+                        cus_per_scalar = options.cu_per_scalar_cache,
+                        wavefronts_per_cu = options.wavefronts_per_cu,
+                        workitems_per_wavefront = options.wf_size,
+                        num_atomic_locations = num_atomic_locs,
+                        num_normal_locs_per_atomic = \
+                                          num_regular_locs_per_atomic_loc,
+                        max_num_episodes = max_episodes,
+                        episode_length = eps_length,
+                        debug_tester = options.debug_tester,
+                        random_seed = options.random_seed,
+                        log_file = options.log_file)

 #
-# Create the M5 system.  Note that the Memory Object isn't
-# actually used by the rubytester, but is included to support the
-# M5 memory size == Ruby memory size checks
+# Create the M5 system. Note that the memory object isn't actually
+# used by the vitester, but is included to support
+# the M5 memory size == Ruby memory size checks
 #
-system = System(cpu=tester, mem_ranges=[AddrRange(options.mem_size)])
+# The system doesn't have real CPUs or CUs.
+# It just has a tester that has physical ports to be connected to Ruby
+#
+system = System(cpu = tester,
+                mem_ranges = [AddrRange(options.mem_size)],
+                cache_line_size = options.cacheline_size,
+                mem_mode = 'timing')

-# Create a top-level voltage domain and clock domain
-system.voltage_domain = VoltageDomain(voltage=options.sys_voltage)
+system.voltage_domain = VoltageDomain(voltage = options.sys_voltage)
+system.clk_domain = SrcClockDomain(clock = options.sys_clock,
+                                   voltage_domain = system.voltage_domain)

-system.clk_domain = SrcClockDomain(clock=options.sys_clock,
-                                   voltage_domain=system.voltage_domain)
+options.num_cp = 0

+#
+# Create the Ruby system
+#
 Ruby.create_system(options, False, system)

-# Create a seperate clock domain for Ruby
-system.ruby.clk_domain = SrcClockDomain(clock=options.ruby_clock,

-voltage_domain=system.voltage_domain)

-
-tester.num_cpus = len(system.ruby._cpu_ports)
-
 #
 # The tester is most effective when randomization is turned on and
 # artifical delay is randomly inserted on messages
 #
 system.ruby.randomization = True

+# assert that we got the right number of Ruby ports
+assert(len(system.ruby._cpu_ports) == n_CPUs + n_CUs + n_SQCs + n_Scalars)
+
+#
+# attach Ruby ports to the tester
+# in the order: cpu_sequencers,
+#               vector_coalescers,
+#               sqc_sequencers,
+#               scalar_sequencers
+#
+print("Attaching ruby ports to the tester")
+i = 0
 for ruby_port in system.ruby._cpu_ports:
-
-    #
-    # Tie the ruby tester ports to the ruby cpu read and write ports
-    #
-    if ruby_port.support_data_reqs and ruby_port.support_inst_reqs:
-        tester.cpuInstDataPort = ruby_port.slave
-    elif ruby_port.support_data_reqs:
-        tester.cpuDataPort = ruby_port.slave
-    elif ruby_port.support_inst_reqs:
-        tester.cpuInstPort = ruby_port.slave
-
-    # Do not automatically retry stalled Ruby requests
     ruby_port.no_retry_on_stall = True
-
-    #
-    # Tell each sequencer this is the ruby tester so that it
-    # copies the subblock back to the checker
-    #
     ruby_port.using_ruby_tester = True

+    if i < n_CPUs:
+        tester.cpu_ports = ruby_port.slave
+    elif i < (n_CPUs + n_CUs):
+        tester.cu_vector_ports = ruby_port.slave
+    elif i < (n_CPUs + n_CUs + n_SQCs):
+        tester.cu_sqc_ports = ruby_port.slave
+    else:
+        tester.cu_scalar_ports = ruby_port.slave
+
+    i += 1
+
+#
+# Create CPU threads
+#
+thread_clock = SrcClockDomain(clock = '1GHz',
+                              voltage_domain = system.voltage_domain)
+
+cpu_threads = []
+print("Creating %i CpuThreads" % n_CPUs)
+for cpu_idx in range(n_CPUs):
+    cpu_threads.append(CpuThread(thread_id = cpu_idx,
+                                 num_lanes = 1,     # CPU thread is scalar
+                                 clk_domain = thread_clock,
+                                 deadlock_threshold = \
+                                        tester_deadlock_threshold))
+tester.cpu_threads = cpu_threads
+
+#
+# Create GPU wavefronts
+#
+wavefronts = []
+g_thread_idx = n_CPUs
+print("Creating %i WFs attached to %i CUs" % \
+                (n_CUs * tester.wavefronts_per_cu, n_CUs))
+for cu_idx in range(n_CUs):
+    for wf_idx in range(tester.wavefronts_per_cu):
+        wavefronts.append(GpuWavefront(thread_id = g_thread_idx,
+                                         cu_id = cu_idx,
+                                         num_lanes = options.wf_size,
+                                         clk_domain = thread_clock,
+                                         deadlock_threshold = \
+                                                tester_deadlock_threshold))
+        g_thread_idx += 1
+tester.wavefronts = wavefronts
+
 # -----------------------
 # run simulation
 # -----------------------

 root = Root( full_system = False, system = system )
-root.system.mem_mode = 'timing'

 # Not much point in this being higher than the L1 latency
 m5.ticks.setGlobalFrequency('1ns')
@@ -184,4 +359,5 @@
 # simulate until program terminates
 exit_event = m5.simulate(options.abs_max_tick)

-print('Exiting @ tick', m5.curTick(), 'because', exit_event.getCause())
+print('Exiting tick: ', m5.curTick())
+print('Exiting because ', exit_event.getCause())

diff --git a/configs/example/viper_ruby_test.pyb/configs/example/viper_ruby_test.py

deleted file mode 100644
index 1c2e4ba..0000000
--- a/configs/example/viper_ruby_test.py
+++ /dev/null
@@ -1,367 +0,0 @@
-#
-# Copyright (c) 2018 Advanced Micro Devices, Inc.
-# All rights reserved.
-#
-# For use for simulation and test purposes only
-#
-# Redistribution and use in source and binary forms, with or without

-# modification, are permitted provided that the following conditions aremet:

-#
-# 1. Redistributions of source code must retain the above copyright notice,
-# this list of conditions and the following disclaimer.
-#

-# 2. Redistributions in binary form must reproduce the above copyrightnotice,

-# this list of conditions and the following disclaimer in the documentation
-# and/or other materials provided with the distribution.
-#
-# 3. Neither the name of the copyright holder nor the names of its
-# contributors may be used to endorse or promote products derived from this
-# software without specific prior written permission.
-#

-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "ASIS"

-# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

-# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULARPURPOSE

-# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)

-# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OFTHE

-# POSSIBILITY OF SUCH DAMAGE.
-#
-# Authors: Tuan Ta, Xianwei Zhang
-#
-
-import m5
-from m5.objects import *
-from m5.defines import buildEnv
-from m5.util import addToPath
-import os, optparse, sys
-
-addToPath('../')
-
-from common import Options
-from ruby import Ruby
-

-# Get paths we might need. It's expected this file is inm5/configs/example.

-config_path = os.path.dirname(os.path.abspath(__file__))
-config_root = os.path.dirname(config_path)
-m5_root = os.path.dirname(config_root)
-
-parser = optparse.OptionParser()
-Options.addNoISAOptions(parser)
-
-# GPU Ruby tester options
-parser.add_option("--cache-size", type="int", default=0,

- help="Cache sizes to use. Small encourages races between\- requests and writebacks. Large stresseswrite-through \

-                        and/or write-back GPU caches. Range [0..1]")
-parser.add_option("--system-size", type="int", default=0,
-                  help="This option defines how many CUs, CPUs and cache \
-                        components in the test system. Range[0..2]")
-parser.add_option("--address-range", type="int", default=0,
-                  help="This option defines the number of atomic \
-                        locations that affects the working set's size. \
-                        A small number of atomic locations encourage more \

- races among threads. The large option stressescache \

-                        resources. Range [0..1]")
-parser.add_option("--episode-length", type="int", default=0,
-                  help="This option defines the number of LDs and \

- STs in an episode. The small option encouragesraces \

-                        between the start and end of an episode. The long \

- option encourages races between LDs and STs in the\

-                        same episode. Range [0..2]")
-parser.add_option("--test-length", type="int", default=1,
-                  help="The number of episodes to be executed by each \

- wavefront. This determines the maximum number,i.e., \- val X #WFs, of episodes to be executed in thetest.")

-parser.add_option("--debug-tester", action='store_true',
-                  help="This option will turn on DRF checker")
-parser.add_option("--random-seed", type="int", default=0,
-                  help="Random seed number. Default value (i.e., 0) means \
-                        using runtime-specific value")
-parser.add_option("--log-file", type="string", default="gpu-ruby-test.log")
-
-# GPU configurations

-parser.add_option("--wf-size", type="int", default=64, help="wavefrontsize")

-
-parser.add_option("-w", "--wavefronts-per-cu", type="int", default=1,
-                  help="Number of wavefronts per cu")
-
-parser.add_option("--cu-per-sqc", type="int", default=4,
-                  help="number of CUs sharing an SQC")
-
-parser.add_option("--cu-per-scalar-cache", type="int", default=4,
-                  help="number of CUs sharing an scalar cache")
-
-parser.add_option("--cu-per-sa", type="int", default=4,
-                  help="number of CUs per shader array \
-                        This must be a multiple of options.cu-per-sqc and \
-                        options.cu-per-scalar")
-#
-# Add the ruby specific and protocol specific options
-#
-Ruby.define_options(parser)
-
-execfile(os.path.join(config_root, "common", "Options.py"))
-
-(options, args) = parser.parse_args()
-
-#

-# Set the default cache size and associativity to be very small toencourage

-# races between requests and writebacks.
-#
-options.l1d_size="256B"
-options.l1i_size="256B"
-options.l2_size="512B"
-options.l3_size="1kB"
-options.l1d_assoc=2
-options.l1i_assoc=2
-options.l2_assoc=2
-options.l3_assoc=2
-
-#
-# Set up cache size - 2 options
-#   0: small cache
-#   1: large cache
-#
-if (options.cache_size == 0):
-    options.tcp_size="256B"
-    options.tcp_assoc=2
-    options.tcc_size="1kB"
-    options.tcc_assoc=2
-elif (options.cache_size == 1):
-    options.tcp_size="256kB"
-    options.tcp_assoc=16
-    options.tcc_size="1024kB"
-    options.tcc_assoc=16
-else:

- print("Error: option cache_size '%s' not recognized",options.cache_size)

-     sys.exit(1)
-
-#
-# Set up system size - 3 options
-#
-if (options.system_size == 0):
-    # 1 CU, 1 CPU, 1 SQC, 1 Scalar
-    options.wf_size = 1
-    options.wavefronts_per_cu = 1
-    options.num_cpus = 1
-    options.cu_per_sqc = 1
-    options.cu_per_scalar_cache = 1
-    options.num_compute_units = 1
-elif (options.system_size == 1):
-    # 4 CUs, 4 CPUs, 1 SQCs, 1 Scalars
-    options.wf_size = 16
-    options.wavefronts_per_cu = 4
-    options.num_cpus = 4
-    options.cu_per_sqc = 4
-    options.cu_per_scalar_cache = 4
-    options.num_compute_units = 4
-elif (options.system_size == 2):
-    # 8 CUs, 4 CPUs, 1 SQCs, 1 Scalars
-    options.wf_size = 32
-    options.wavefronts_per_cu = 4
-    options.num_cpus = 4
-    options.cu_per_sqc = 4
-    options.cu_per_scalar_cache = 4
-    options.num_compute_units = 8
-else:

- print("Error: option system size '%s' not recognized",options.system_size)

-    sys.exit(1)
-
-#
-# set address range - 2 options
-#   level 0: small
-#   level 1: large
-# each location corresponds to a 4-byte piece of data
-#
-options.mem_size = '1024MB'
-num_atomic_locs = 10
-num_regular_locs_per_atomic_loc = 10000
-if (options.address_range == 1):
-    num_atomic_locs = 100
-    num_regular_locs_per_atomic_loc = 100000
-elif (options.address_range != 0):
-    print("Error: option address_range '%s' not recognized", \
-              options.address_range)
-    sys.exit(1)
-
-#
-# set episode length (# of actions per episode) - 3 options
-#   0: 10 actions
-#   1: 100 actions
-#   2: 500 actions
-#
-eps_length = 10
-if (options.episode_length == 1):
-    eps_length = 100
-elif (options.episode_length == 2):
-    eps_length = 500
-elif (options.episode_length != 0):
-    print("Error: option episode_length '%s' not recognized",
-              options.episode_length)
-    sys.exit(1)
-
-# set the Ruby's and tester's deadlock thresholds
-# the Ruby's deadlock detection is the primary check for deadlock.

-# the tester's deadlock threshold detection is a secondary check fordeadlock

-# if there is a bug in RubyPort that causes a packet not to return to the
-# tester properly, the tester will throw a deadlock exception.
-# we set cache_deadlock_threshold < tester_deadlock_threshold to detect
-# deadlock caused by Ruby protocol first before one caused by the coalescer
-options.cache_deadlock_threshold = 100000000
-tester_deadlock_threshold = 1000000000
-
-# for now, we're testing only GPU protocol, so we set num_cpus to 0
-options.num_cpus = 0
-# number of CPUs and CUs
-n_CPUs = options.num_cpus
-n_CUs = options.num_compute_units
-# set test length, i.e., number of episodes per wavefront * #WFs
-# test length can be 1x#WFs, 10x#WFs, 100x#WFs, ...
-n_WFs = n_CUs * options.wavefronts_per_cu
-max_episodes = options.test_length * n_WFs
-# number of SQC and Scalar caches
-assert(n_CUs % options.cu_per_sqc == 0)
-n_SQCs = int(n_CUs/options.cu_per_sqc)
-options.num_sqc = n_SQCs
-assert(n_CUs % options.cu_per_scalar_cache == 0)
-n_Scalars = int(n_CUs/options.cu_per_scalar_cache)
-
-# for now, we only set CUs and SQCs
-# TODO: add scalars if necessary
-n_Scalars = 0
-options.num_scalar_cache = n_Scalars
-if n_Scalars == 0:
-    options.cu_per_scalar_cache = 0
-
-if args:
-     print("Error: script doesn't take any positional arguments")
-     sys.exit(1)
-
-#
-# Create GPU Ruby random tester
-#
-tester = ProtocolTester(cus_per_sqc = options.cu_per_sqc,
-                        cus_per_scalar = options.cu_per_scalar_cache,
-                        wavefronts_per_cu = options.wavefronts_per_cu,
-                        workitems_per_wavefront = options.wf_size,
-                        num_atomic_locations = num_atomic_locs,
-                        num_normal_locs_per_atomic = \
-                                          num_regular_locs_per_atomic_loc,
-                        max_num_episodes = max_episodes,
-                        episode_length = eps_length,
-                        debug_tester = options.debug_tester,
-                        random_seed = options.random_seed,
-                        log_file = options.log_file)
-
-#
-# Create the M5 system. Note that the memory object isn't actually
-# used by the vitester, but is included to support
-# the M5 memory size == Ruby memory size checks
-#
-# The system doesn't have real CPUs or CUs.
-# It just has a tester that has physical ports to be connected to Ruby
-#
-system = System(cpu = tester,
-                mem_ranges = [AddrRange(options.mem_size)],
-                cache_line_size = options.cacheline_size,
-                mem_mode = 'timing')
-
-system.voltage_domain = VoltageDomain(voltage = options.sys_voltage)
-system.clk_domain = SrcClockDomain(clock = options.sys_clock,
-                                   voltage_domain = system.voltage_domain)
-
-options.num_cp = 0
-
-#
-# Create the Ruby system
-#
-Ruby.create_system(options, False, system)
-
-#
-# The tester is most effective when randomization is turned on and
-# artifical delay is randomly inserted on messages
-#
-system.ruby.randomization = True
-
-# assert that we got the right number of Ruby ports
-assert(len(system.ruby._cpu_ports) == n_CPUs + n_CUs + n_SQCs + n_Scalars)
-
-#
-# attach Ruby ports to the tester
-# in the order: cpu_sequencers,
-#               vector_coalescers,
-#               sqc_sequencers,
-#               scalar_sequencers
-#
-print("Attaching ruby ports to the tester")
-i = 0
-for ruby_port in system.ruby._cpu_ports:
-    ruby_port.no_retry_on_stall = True
-    ruby_port.using_ruby_tester = True
-
-    if i < n_CPUs:
-        tester.cpu_ports = ruby_port.slave
-    elif i < (n_CPUs + n_CUs):
-        tester.cu_vector_ports = ruby_port.slave
-    elif i < (n_CPUs + n_CUs + n_SQCs):
-        tester.cu_sqc_ports = ruby_port.slave
-    else:
-        tester.cu_scalar_ports = ruby_port.slave
-
-    i += 1
-
-#
-# Create CPU threads
-#
-thread_clock = SrcClockDomain(clock = '1GHz',
-                              voltage_domain = system.voltage_domain)
-
-cpu_threads = []
-print("Creating %i CpuThreads" % n_CPUs)
-for cpu_idx in range(n_CPUs):
-    cpu_threads.append(CpuThread(thread_id = cpu_idx,
-                                 num_lanes = 1,     # CPU thread is scalar
-                                 clk_domain = thread_clock,
-                                 deadlock_threshold = \
-                                        tester_deadlock_threshold))
-tester.cpu_threads = cpu_threads
-
-#
-# Create GPU wavefronts
-#
-wavefronts = []
-g_thread_idx = n_CPUs
-print("Creating %i WFs attached to %i CUs" % \
-                (n_CUs * tester.wavefronts_per_cu, n_CUs))
-for cu_idx in range(n_CUs):
-    for wf_idx in range(tester.wavefronts_per_cu):
-        wavefronts.append(GpuWavefront(thread_id = g_thread_idx,
-                                         cu_id = cu_idx,
-                                         num_lanes = options.wf_size,
-                                         clk_domain = thread_clock,
-                                         deadlock_threshold = \
-                                                tester_deadlock_threshold))
-        g_thread_idx += 1
-tester.wavefronts = wavefronts
-
-# -----------------------
-# run simulation
-# -----------------------
-
-root = Root( full_system = False, system = system )
-
-# Not much point in this being higher than the L1 latency
-m5.ticks.setGlobalFrequency('1ns')
-
-# instantiate configuration
-m5.instantiate()
-
-# simulate until program terminates
-exit_event = m5.simulate(options.abs_max_tick)
-
-print('Exiting tick: ', m5.curTick())
-print('Exiting because ', exit_event.getCause())

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/34196

To unsubscribe, or for help writing mail filters, visithttps://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ia1a0c16302740d60bf2100c82f72c1acf1d39609
Gerrit-Change-Number: 34196
Gerrit-PatchSet: 1
Gerrit-Owner: Bradford Beckmann <[email protected]>
Gerrit-MessageType: newchange

_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: tests,configs: Updates to gpu protocol tester

Reply via email to