Hello,
The latency of the main memory is in a .py file in /src/mem. With the
base se.py (if you are using system emulation), SimpleMemory is used,
hence the file to look at is /src/mem/SimpleMemory.py. The default
latency is 30 ns (@1Ghz if I remember correctly).
The OpLat you see in the code you show is the execution latency of the
load/store /execution/ in the functional unit, but it is in no way the
latency of the memory. Similarly, you will probably see the ALU at one
cycle and the IntMult at more than one cycle. Hope it helps.
Arthur Perais.
Le 09/02/2013 04:53, Gabriel Yessin a écrit :
Hey all,
I really need some input on this one.
I was running bbench and noticed the run times for architectures with
any sized a L2 cache were MUCH slower than any architecture with no L2
cache.
For instance, loading Twitter, two of the warm start times per
architecture were:
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache No L2 Cache: 2.547 seconds
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 5.498
seconds
That's a factor of 2 slowdown for using L2 caches vs no L2 caches.
More results for Twitter (I have even more than this, but just want to
show the pattern):
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache:1.666 seconds
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 2.002
seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache: 1.697 seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 1.991
seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, 2048 kB L2 Cache:1.578
seconds
_*My basic commands:*_
./build/ARM/gem5.fast -v --dump-config=config_single_twitter.ini
--outdir=m5out_single_twitter_05GHz_64kB_0kB configs/example/fs.py -b
bbench-gb
--kernel=/home/gyessin/bbench1site/dist_twitter/m5/system/binaries/vmlinux.smp.mouse.arm
--frame-capture --checkpoint-dir=checkpoint_single_twitter
--disk-image=/home/gyessin/bbench1site/dist_twitter/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img
--caches -s 300000000 -r 1 --l1d_size=64kB --l1i_size=64kB --clock=0.5GHz
and
./build/ARM/gem5.fast -v --dump-config=config_single_msn.ini
--outdir=m5out_single_msn_05GHz_16kB_1024kB configs/example/fs.py -b
bbench-gb
--kernel=/home/gyessin/bbench1site/dist_msn/m5/system/binaries/vmlinux.smp.mouse.arm
--frame-capture --checkpoint-dir=checkpoint_single_msn
--disk-image=/home/gyessin/bbench1site/dist_msn/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img
--caches -s 300000000 -r 1 --l1d_size=16kB --l1i_size=16kB --l2cache
--l2_size=1024kB --clock=0.5GHz
(They're restoring from a checkpoint taken right after the sleep 10
in gem5/configs/boot/bbench-gb.rcS and they are running from a version
cloned only a few days ago from the development repository)
Looking at configs/common/O3_ARM_v7a.py (Relevant bits copied and
highlighted below), unless I'm misinterpreting something, it would
appear that:
L1 Instruction latency = 1 cycle (reasonable)
L1 Data latency = 2 cycles (reasonable)
TLB Cache Latency = 4 cycles (a little low, I think, but fine)
L2 Cache Latency = 12 cycles (reasonable)
*Memory Write Latency = Memory ReadLatency = 2 cycles (AS LOW AS L1
DATA?!!! Seem absurd!) *
*
*
Am I understanding this right or did I misinterpret the code? It
really seems absurd, I would assume MemWrite and MemRead should be
about 200 cpu cycles, correct?
*By the way, I'm not trying to be inflammatory or insult anyone who
might have edited the code, just trying to get to the bottom this asap
so I can meet my paper deadlines.
Any input on this would be greatly appreciated.
_*Relevant Parts of O3_ARM_v7a.py:*_
....
# Load/Store Units
class O3_ARM_v7a_Load(FUDesc):
opList = [ OpDesc(opClass='MemRead',opLat=*2*) ]
count = 1
class O3_ARM_v7a_Store(FUDesc):
opList = [OpDesc(opClass='MemWrite',opLat=*2*) ]
count = 1
....
# Instruction Cache
class O3_ARM_v7a_ICache(BaseCache):
hit_latency = *1*
response_latency = *1*
...
# Data Cache
class O3_ARM_v7a_DCache(BaseCache):
hit_latency = *2*
response_latency = *2*
...
# TLB Cache
# Use a cache as a L2 TLB
class O3_ARM_v7aWalkCache(BaseCache):
hit_latency = *4*
response_latency = *4*
...
# L2 Cache
class O3_ARM_v7aL2(BaseCache):
hit_latency = *12*
response_latency = *12*
...
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users