Hello,

The latency of the main memory is in a .py file in /src/mem. With the base se.py (if you are using system emulation), SimpleMemory is used, hence the file to look at is /src/mem/SimpleMemory.py. The default latency is 30 ns (@1Ghz if I remember correctly).

The OpLat you see in the code you show is the execution latency of the load/store /execution/ in the functional unit, but it is in no way the latency of the memory. Similarly, you will probably see the ALU at one cycle and the IntMult at more than one cycle. Hope it helps.

Arthur Perais.

Le 09/02/2013 04:53, Gabriel Yessin a écrit :
Hey all,

I really need some input on this one.

I was running bbench and noticed the run times for architectures with any sized a L2 cache were MUCH slower than any architecture with no L2 cache.

For instance, loading Twitter, two of the warm start times per architecture were:
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache No L2 Cache:  2.547 seconds
0.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 5.498 seconds

That's a factor of 2 slowdown for using L2 caches vs no L2 caches.

More results for Twitter (I have even more than this, but just want to show the pattern):
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache:1.666 seconds
1.0GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 2.002 seconds

1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, No L2 Cache: 1.697 seconds
1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache,1024 kB L2 Cache: 1.991 seconds 1.5 GHz, 16kB L1 Inst Cache, 16KB Data Cache, 2048 kB L2 Cache:1.578 seconds

_*My basic commands:*_

./build/ARM/gem5.fast -v --dump-config=config_single_twitter.ini --outdir=m5out_single_twitter_05GHz_64kB_0kB configs/example/fs.py -b bbench-gb --kernel=/home/gyessin/bbench1site/dist_twitter/m5/system/binaries/vmlinux.smp.mouse.arm --frame-capture --checkpoint-dir=checkpoint_single_twitter --disk-image=/home/gyessin/bbench1site/dist_twitter/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img --caches -s 300000000 -r 1 --l1d_size=64kB --l1i_size=64kB --clock=0.5GHz

and

./build/ARM/gem5.fast -v --dump-config=config_single_msn.ini --outdir=m5out_single_msn_05GHz_16kB_1024kB configs/example/fs.py -b bbench-gb --kernel=/home/gyessin/bbench1site/dist_msn/m5/system/binaries/vmlinux.smp.mouse.arm --frame-capture --checkpoint-dir=checkpoint_single_msn --disk-image=/home/gyessin/bbench1site/dist_msn/m5/system/disks/ARMv7a-Gingerbread-Android.SMP.mouse.nolock.img --caches -s 300000000 -r 1 --l1d_size=16kB --l1i_size=16kB --l2cache --l2_size=1024kB --clock=0.5GHz

(They're restoring from a checkpoint taken right after the sleep 10 in gem5/configs/boot/bbench-gb.rcS and they are running from a version cloned only a few days ago from the development repository)



Looking at configs/common/O3_ARM_v7a.py (Relevant bits copied and highlighted below), unless I'm misinterpreting something, it would appear that:
L1 Instruction latency = 1 cycle (reasonable)
L1 Data latency = 2 cycles (reasonable)
TLB Cache Latency = 4 cycles (a little low, I think, but fine)
L2 Cache Latency = 12 cycles (reasonable)
*Memory Write Latency = Memory ReadLatency = 2 cycles (AS LOW AS L1 DATA?!!! Seem absurd!) *
*
*
Am I understanding this right or did I misinterpret the code? It really seems absurd, I would assume MemWrite and MemRead should be about 200 cpu cycles, correct?

*By the way, I'm not trying to be inflammatory or insult anyone who might have edited the code, just trying to get to the bottom this asap so I can meet my paper deadlines.

Any input on this would be greatly appreciated.

_*Relevant Parts of O3_ARM_v7a.py:*_
....
# Load/Store Units
class O3_ARM_v7a_Load(FUDesc):
    opList = [ OpDesc(opClass='MemRead',opLat=*2*) ]
    count = 1

class O3_ARM_v7a_Store(FUDesc):
    opList = [OpDesc(opClass='MemWrite',opLat=*2*) ]
    count = 1
....

# Instruction Cache
class O3_ARM_v7a_ICache(BaseCache):
    hit_latency = *1*
    response_latency = *1*
...

# Data Cache
class O3_ARM_v7a_DCache(BaseCache):
    hit_latency = *2*
    response_latency = *2*
...

# TLB Cache
# Use a cache as a L2 TLB
class O3_ARM_v7aWalkCache(BaseCache):
    hit_latency = *4*
    response_latency = *4*

...

# L2 Cache
class O3_ARM_v7aL2(BaseCache):
    hit_latency = *12*
    response_latency = *12*
...


_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to