Dear all,

Thanks a lot for all your explanation below. I'm now sticking to the classical 
Xbar memory system, not the ruby one. I accept the fact that the state 
transition or cache coherency takes zero time in this case.

However today I studied the exec debug trace again for ALPHA FS simulation and 
found the following interesting entries:


3334580479000: system.switch_cpus02 T0 : 0x12000867c    : ldq        
r2,29968(r1)    : MemRead :  A=0x1200adda8

......

3334580495000: system.switch_cpus02 T0 : 0x12000867c    : ldq        
r2,29968(r1)    : MemRead :  D=0x00000001200adda8 A=0x1200adda8


You see at the first entry cpu02 tries to read address from A=0x1200adda8 but 
without data. Some time later at entry 2 I found the same core at the same 
instruction address is accessing the same data address with the same registers. 
But this time a valid data is returned D=0x00000001200adda8.

Can I explain this as memory access request at the 1st entry, and data 
acknowledgement at the 2nd entry? Does it have something to do with Cache miss?

If you compare this with the cache debug trace, you will find out that the 1st 
entry is not noted in cache trace. We have only notation in cache trace for 2nd 
entry.

Then what happened at first entry?

I would like to say, this kind of accesses take only a very small percentage of 
all memory accesses. Most memory accesses acquired the data already at their 
first entries.


Also there are other kind of memory accesses in exec trace which have neither 
data address A=0x.... or returned data D=0x... example is below:


3334580433000: system.switch_cpus00 T0 : @iowrite8+36    : mb                   
      : MemRead :


How to explain this?


PS: I still don't know how to reply and hanging my post onto an existing topic 
in gem5 mailing list? instead of opening a new topic?

Thanks in advance.


Best regards,

Mengyu



________________________________
Von: mengyu liang <[email protected]>
Gesendet: Sonntag, 6. November 2016 22:03
An: gem5 forum
Betreff: Understanding of cache trace of ALPHA timing CPU


Hello everyone,

Recently I am studying the memory access time, (i.e. the duration of memory 
load and store) in term of CPU cycles in a multicore system. I come up with 
alpha timing CPU and have run several Full system simulation with Parsec 
workloads. In order to look into details of memory access procedure, I turned 
on the debug trace of Cache.

However I am very disappointed to see that the entire memory access is treated 
"atomically". To illustrate my doubt, I paste the following Cache trace segment:


3587305218000: system.cpu3.dcache: ReadReq addr 0x6bcac8 size 8 (ns) miss
3587305218000: system.cpu3.dcache: createMissPacket created ReadSharedReq from 
ReadReq for  addr 0x6bcac0 size 32
3587305218000: system.cpu3.dcache: Sending an atomic ReadSharedReq for 0x6bcac0 
(ns)
3587305218000: system.cpu0.dcache: handleSnoop snoop hit for CleanEvict addr 
0x8601c0 size 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 
dirty: 0 tag: 10c03
3587305218000: system.cpu0.dcache: Found addr 0x8601c0 in upper level cache for 
snoop CleanEvict from lower cache
3587305218000: system.cpu1.dcache: handleSnoop snoop hit for CleanEvict addr 
0x8601c0 size 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 
dirty: 0 tag: 10c03
3587305218000: system.cpu1.dcache: Found addr 0x8601c0 in upper level cache for 
snoop CleanEvict from lower cache
3587305218000: system.cpu3.dcache: Receive response: ReadResp for addr 0x6bcac0 
(ns) in state 0
3587305218000: system.cpu3.dcache: replacement: replacing 0x3f0d0040 (ns) with 
0x6bcac0 (ns): writeback
3587305218000: system.cpu3.dcache: Create Writeback 0x3f0d0040 writable: 1, 
dirty: 1
3587305218000: system.cpu3.dcache: Block addr 0x6bcac0 (ns) moving from state 0 
to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: d795


As you can see above, cpu3 initiates a read request at the very beginning but 
encountered cache miss. So it has triggered a series of cache actions due to 
cache coherency. However they ALL take place at the same time tick, as if every 
memory access, no matter if it is cache miss or hit, takes ZERO time!


As per the documentation of gem5, The TimingSimpleCPU is the version of 
SimpleCPU that uses timing memory accesses. It stalls on cache accesses and 
waits for the memory system to respond prior to proceeding. Based on that, I 
didn't expect an atomic-like behavior of timing CPU. It should have exhibited 
non-zero duration for each memory access.


Does anybody have the same experience and can explain the reason for that?


Or is there any CPU model which behaves non-atomically and can be implemented 
in multicore system? As far as I know, only O3 CPU does this job, however it's 
out of order. I need an in-order CPU.


Thanks and best regards,

Mengyu Liang

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to