Hello everyone,

Recently I am studying the memory access time, (i.e. the duration of memory 
load and store) in term of CPU cycles in a multicore system. I come up with 
alpha timing CPU and have run several Full system simulation with Parsec 
workloads. In order to look into details of memory access procedure, I turned 
on the debug trace of Cache.

However I am very disappointed to see that the entire memory access is treated 
"atomically". To illustrate my doubt, I paste the following Cache trace segment:


3587305218000: system.cpu3.dcache: ReadReq addr 0x6bcac8 size 8 (ns) miss
3587305218000: system.cpu3.dcache: createMissPacket created ReadSharedReq from 
ReadReq for  addr 0x6bcac0 size 32
3587305218000: system.cpu3.dcache: Sending an atomic ReadSharedReq for 0x6bcac0 
(ns)
3587305218000: system.cpu0.dcache: handleSnoop snoop hit for CleanEvict addr 
0x8601c0 size 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 
dirty: 0 tag: 10c03
3587305218000: system.cpu0.dcache: Found addr 0x8601c0 in upper level cache for 
snoop CleanEvict from lower cache
3587305218000: system.cpu1.dcache: handleSnoop snoop hit for CleanEvict addr 
0x8601c0 size 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 
dirty: 0 tag: 10c03
3587305218000: system.cpu1.dcache: Found addr 0x8601c0 in upper level cache for 
snoop CleanEvict from lower cache
3587305218000: system.cpu3.dcache: Receive response: ReadResp for addr 0x6bcac0 
(ns) in state 0
3587305218000: system.cpu3.dcache: replacement: replacing 0x3f0d0040 (ns) with 
0x6bcac0 (ns): writeback
3587305218000: system.cpu3.dcache: Create Writeback 0x3f0d0040 writable: 1, 
dirty: 1
3587305218000: system.cpu3.dcache: Block addr 0x6bcac0 (ns) moving from state 0 
to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: d795


As you can see above, cpu3 initiates a read request at the very beginning but 
encountered cache miss. So it has triggered a series of cache actions due to 
cache coherency. However they ALL take place at the same time tick, as if every 
memory access, no matter if it is cache miss or hit, takes ZERO time!


As per the documentation of gem5, The TimingSimpleCPU is the version of 
SimpleCPU that uses timing memory accesses. It stalls on cache accesses and 
waits for the memory system to respond prior to proceeding. Based on that, I 
didn't expect an atomic-like behavior of timing CPU. It should have exhibited 
non-zero duration for each memory access.


Does anybody have the same experience and can explain the reason for that?


Or is there any CPU model which behaves non-atomically and can be implemented 
in multicore system? As far as I know, only O3 CPU does this job, however it's 
out of order. I need an in-order CPU.


Thanks and best regards,

Mengyu Liang

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to