Hello Mengyu Liang,
i would recommend that you check out the thesis of Uri Wiener (Modeling and
Analysis of a Cache Coherent Interconnect)

he describes the decisions made on the implementation of the CCI model on
gem5.

quoting page 25: "Snoop requests from the slave are handled and forwarded
in zero time. This major inaccuracy is intended for avoiding race
conditions in the memory system, and mostly the need to implement
transition-states in the
cache-controller."

On Sun, Nov 6, 2016 at 7:03 PM, mengyu liang <[email protected]> wrote:

> Hello everyone,
>
>
> Recently I am studying the memory access time, (i.e. the duration of
> memory load and store) in term of CPU cycles in a multicore system. I come
> up with alpha timing CPU and have run several Full system simulation with
> Parsec workloads. In order to look into details of memory access procedure,
> I turned on the debug trace of Cache.
>
> However I am very disappointed to see that the entire memory access is
> treated "atomically". To illustrate my doubt, I paste the following Cache
> trace segment:
>
>
>
>
>
>
>
>
>
>
>
>
> *3587305218000: system.cpu3.dcache: ReadReq addr 0x6bcac8 size 8 (ns) miss
> 3587305218000: system.cpu3.dcache: createMissPacket created ReadSharedReq
> from ReadReq for  addr 0x6bcac0 size 32 3587305218000: system.cpu3.dcache:
> Sending an atomic ReadSharedReq for 0x6bcac0 (ns) 3587305218000:
> system.cpu0.dcache: handleSnoop snoop hit for CleanEvict addr 0x8601c0 size
> 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0
> tag: 10c03 3587305218000: system.cpu0.dcache: Found addr 0x8601c0 in upper
> level cache for snoop CleanEvict from lower cache 3587305218000:
> system.cpu1.dcache: handleSnoop snoop hit for CleanEvict addr 0x8601c0 size
> 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0
> tag: 10c03 3587305218000: system.cpu1.dcache: Found addr 0x8601c0 in upper
> level cache for snoop CleanEvict from lower cache 3587305218000:
> system.cpu3.dcache: Receive response: ReadResp for addr 0x6bcac0 (ns) in
> state 0 3587305218000: system.cpu3.dcache: replacement: replacing
> 0x3f0d0040 (ns) with 0x6bcac0 (ns): writeback 3587305218000:
> system.cpu3.dcache: Create Writeback 0x3f0d0040 writable: 1, dirty: 1
> 3587305218000: system.cpu3.dcache: Block addr 0x6bcac0 (ns) moving from
> state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: d795*
>
>
> As you can see above, cpu3 initiates a read request at the very beginning
> but encountered cache miss. So it has triggered a series of cache actions
> due to cache coherency. However they ALL take place at the same time tick,
> as if every memory access, no matter if it is cache miss or hit, takes ZERO
> time!
>
>
> As per the documentation of gem5, *The TimingSimpleCPU is the version of
> SimpleCPU that uses timing memory accesses. It stalls on cache accesses and
> waits for the memory system to respond prior to proceeding*. Based on
> that, I didn't expect an atomic-like behavior of timing CPU. It should have
> exhibited non-zero duration for each memory access.
>
>
> Does anybody have the same experience and can explain the reason for that?
>
>
> Or is there any CPU model which behaves non-atomically and can be
> implemented in multicore system? As far as I know, only O3 CPU does this
> job, however it's out of order. I need an in-order CPU.
>
>
> Thanks and best regards,
>
> Mengyu Liang
>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to