Hello everyone, Recently I am studying the memory access time, (i.e. the duration of memory load and store) in term of CPU cycles in a multicore system. I come up with alpha timing CPU and have run several Full system simulation with Parsec workloads. In order to look into details of memory access procedure, I turned on the debug trace of Cache.
However I am very disappointed to see that the entire memory access is treated "atomically". To illustrate my doubt, I paste the following Cache trace segment: 3587305218000: system.cpu3.dcache: ReadReq addr 0x6bcac8 size 8 (ns) miss 3587305218000: system.cpu3.dcache: createMissPacket created ReadSharedReq from ReadReq for addr 0x6bcac0 size 32 3587305218000: system.cpu3.dcache: Sending an atomic ReadSharedReq for 0x6bcac0 (ns) 3587305218000: system.cpu0.dcache: handleSnoop snoop hit for CleanEvict addr 0x8601c0 size 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0 tag: 10c03 3587305218000: system.cpu0.dcache: Found addr 0x8601c0 in upper level cache for snoop CleanEvict from lower cache 3587305218000: system.cpu1.dcache: handleSnoop snoop hit for CleanEvict addr 0x8601c0 size 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0 tag: 10c03 3587305218000: system.cpu1.dcache: Found addr 0x8601c0 in upper level cache for snoop CleanEvict from lower cache 3587305218000: system.cpu3.dcache: Receive response: ReadResp for addr 0x6bcac0 (ns) in state 0 3587305218000: system.cpu3.dcache: replacement: replacing 0x3f0d0040 (ns) with 0x6bcac0 (ns): writeback 3587305218000: system.cpu3.dcache: Create Writeback 0x3f0d0040 writable: 1, dirty: 1 3587305218000: system.cpu3.dcache: Block addr 0x6bcac0 (ns) moving from state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: d795 As you can see above, cpu3 initiates a read request at the very beginning but encountered cache miss. So it has triggered a series of cache actions due to cache coherency. However they ALL take place at the same time tick, as if every memory access, no matter if it is cache miss or hit, takes ZERO time! As per the documentation of gem5, The TimingSimpleCPU is the version of SimpleCPU that uses timing memory accesses. It stalls on cache accesses and waits for the memory system to respond prior to proceeding. Based on that, I didn't expect an atomic-like behavior of timing CPU. It should have exhibited non-zero duration for each memory access. Does anybody have the same experience and can explain the reason for that? Or is there any CPU model which behaves non-atomically and can be implemented in multicore system? As far as I know, only O3 CPU does this job, however it's out of order. I need an in-order CPU. Thanks and best regards, Mengyu Liang
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
