Re: [gem5-users] Why are the detailed ARM CPU's memory hierarchy access times all crazy?

Arthur Perais Sat, 09 Feb 2013 03:42:01 -0800

Hello,

The latency of the main memory is in a .py file in /src/mem. With thebase se.py (if you are using system emulation), SimpleMemory is used,hence the file to look at is /src/mem/SimpleMemory.py. The defaultlatency is 30 ns (@1Ghz if I remember correctly).

The OpLat you see in the code you show is the execution latency of theload/store /execution/ in the functional unit, but it is in no way thelatency of the memory. Similarly, you will probably see the ALU at onecycle and the IntMult at more than one cycle. Hope it helps.


Arthur Perais.

Le 09/02/2013 04:53, Gabriel Yessin a écrit :

Hey all,

I really need some input on this one.
I was running bbench and noticed the run times for architectures withany sized a L2 cache were MUCH slower than any architecture with no L2cache.
For instance, loading Twitter, two of the warm start times perarchitecture were:
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache No L2 Cache:  2.547 seconds
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 5.498seconds
That's a factor of 2 slowdown for using L2 caches vs no L2 caches.
More results for Twitter (I have even more than this, but just want toshow the pattern):
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache:1.666 seconds
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 2.002seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache: 1.697 seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 1.991seconds1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, 2048 kB L2 Cache:1.578seconds
_*My basic commands:*_
./build/ARM/gem5.fast -v --dump-config=config_single_twitter.ini--outdir=m5out_single_twitter_05GHz_64kB_0kB configs/example/fs.py -bbbench-gb--kernel=/home/gyessin/bbench1site/dist_twitter/m5/system/binaries/vmlinux.smp.mouse.arm--frame-capture --checkpoint-dir=checkpoint_single_twitter--disk-image=/home/gyessin/bbench1site/dist_twitter/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img--caches -s 300000000 -r 1 --l1d_size=64kB --l1i_size=64kB --clock=0.5GHz
and
./build/ARM/gem5.fast -v --dump-config=config_single_msn.ini--outdir=m5out_single_msn_05GHz_16kB_1024kB configs/example/fs.py -bbbench-gb--kernel=/home/gyessin/bbench1site/dist_msn/m5/system/binaries/vmlinux.smp.mouse.arm--frame-capture --checkpoint-dir=checkpoint_single_msn--disk-image=/home/gyessin/bbench1site/dist_msn/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img--caches -s 300000000 -r 1 --l1d_size=16kB --l1i_size=16kB --l2cache--l2_size=1024kB --clock=0.5GHz
(They're restoring from a checkpoint taken right after the sleep 10in gem5/configs/boot/bbench-gb.rcS and they are running from a versioncloned only a few days ago from the development repository)
Looking at configs/common/O3_ARM_v7a.py (Relevant bits copied andhighlighted below), unless I'm misinterpreting something, it wouldappear that:
L1 Instruction latency = 1 cycle (reasonable)
L1 Data latency = 2 cycles (reasonable)
TLB Cache Latency = 4 cycles (a little low, I think, but fine)
L2 Cache Latency = 12 cycles (reasonable)
*Memory Write Latency = Memory ReadLatency = 2 cycles (AS LOW AS L1DATA?!!! Seem absurd!) *
*
*
Am I understanding this right or did I misinterpret the code? Itreally seems absurd, I would assume MemWrite and MemRead should beabout 200 cpu cycles, correct?
*By the way, I'm not trying to be inflammatory or insult anyone whomight have edited the code, just trying to get to the bottom this asapso I can meet my paper deadlines.
Any input on this would be greatly appreciated.

_*Relevant Parts of O3_ARM_v7a.py:*_
....
# Load/Store Units
class O3_ARM_v7a_Load(FUDesc):
    opList = [ OpDesc(opClass='MemRead',opLat=*2*) ]
    count = 1

class O3_ARM_v7a_Store(FUDesc):
    opList = [OpDesc(opClass='MemWrite',opLat=*2*) ]
    count = 1
....

# Instruction Cache
class O3_ARM_v7a_ICache(BaseCache):
    hit_latency = *1*
    response_latency = *1*
...

# Data Cache
class O3_ARM_v7a_DCache(BaseCache):
    hit_latency = *2*
    response_latency = *2*
...

# TLB Cache
# Use a cache as a L2 TLB
class O3_ARM_v7aWalkCache(BaseCache):
    hit_latency = *4*
    response_latency = *4*

...

# L2 Cache
class O3_ARM_v7aL2(BaseCache):
    hit_latency = *12*
    response_latency = *12*
...


_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Why are the detailed ARM CPU's memory hierarchy access times all crazy?

Reply via email to