Hey all,
I really need some input on this one.
I was running bbench and noticed the run times for architectures with any
sized a L2 cache were MUCH slower than any architecture with no L2 cache.
For instance, loading Twitter, two of the warm start times per architecture
were:
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache No L2 Cache: 2.547 seconds
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 5.498 seconds
That's a factor of 2 slowdown for using L2 caches vs no L2 caches.
More results for Twitter (I have even more than this, but just want to show
the pattern):
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache:1.666 seconds
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 2.002 seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache: 1.697 seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 1.991 seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, 2048 kB L2 Cache:1.578 seconds
*My basic commands:*
./build/ARM/gem5.fast -v --dump-config=config_single_twitter.ini
--outdir=m5out_single_twitter_05GHz_64kB_0kB configs/example/fs.py -b
bbench-gb
--kernel=/home/gyessin/bbench1site/dist_twitter/m5/system/binaries/vmlinux.smp.mouse.arm
--frame-capture --checkpoint-dir=checkpoint_single_twitter
--disk-image=/home/gyessin/bbench1site/dist_twitter/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img
--caches -s 300000000 -r 1 --l1d_size=64kB --l1i_size=64kB --clock=0.5GHz
and
./build/ARM/gem5.fast -v --dump-config=config_single_msn.ini
--outdir=m5out_single_msn_05GHz_16kB_1024kB configs/example/fs.py -b
bbench-gb
--kernel=/home/gyessin/bbench1site/dist_msn/m5/system/binaries/vmlinux.smp.mouse.arm
--frame-capture --checkpoint-dir=checkpoint_single_msn
--disk-image=/home/gyessin/bbench1site/dist_msn/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img
--caches -s 300000000 -r 1 --l1d_size=16kB --l1i_size=16kB --l2cache
--l2_size=1024kB --clock=0.5GHz
(They're restoring from a checkpoint taken right after the sleep 10
in gem5/configs/boot/bbench-gb.rcS and they are running from a version
cloned only a few days ago from the development repository)
Looking at configs/common/O3_ARM_v7a.py (Relevant bits copied and
highlighted below), unless I'm misinterpreting something, it would appear
that:
L1 Instruction latency = 1 cycle (reasonable)
L1 Data latency = 2 cycles (reasonable)
TLB Cache Latency = 4 cycles (a little low, I think, but fine)
L2 Cache Latency = 12 cycles (reasonable)
*Memory Write Latency = Memory ReadLatency = 2 cycles (AS LOW AS L1
DATA?!!! Seem absurd!) *
*
*
Am I understanding this right or did I misinterpret the code? It really
seems absurd, I would assume MemWrite and MemRead should be about 200 cpu
cycles, correct?
*By the way, I'm not trying to be inflammatory or insult anyone who might
have edited the code, just trying to get to the bottom this asap so I can
meet my paper deadlines.
Any input on this would be greatly appreciated.
*Relevant Parts of O3_ARM_v7a.py:*
....
# Load/Store Units
class O3_ARM_v7a_Load(FUDesc):
opList = [ OpDesc(opClass='MemRead',opLat=*2*) ]
count = 1
class O3_ARM_v7a_Store(FUDesc):
opList = [OpDesc(opClass='MemWrite',opLat=*2*) ]
count = 1
....
# Instruction Cache
class O3_ARM_v7a_ICache(BaseCache):
hit_latency = *1*
response_latency = *1*
...
# Data Cache
class O3_ARM_v7a_DCache(BaseCache):
hit_latency = *2*
response_latency = *2*
...
# TLB Cache
# Use a cache as a L2 TLB
class O3_ARM_v7aWalkCache(BaseCache):
hit_latency = *4*
response_latency = *4*
...
# L2 Cache
class O3_ARM_v7aL2(BaseCache):
hit_latency = *12*
response_latency = *12*
...
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users