Re: [gem5-users] Understanding of cache trace of ALPHA timing CPU

Jason Lowe-Power Sun, 20 Nov 2016 12:25:13 -0800

Hello,

To reply to a post, you should just click "reply" in your email client.


For your question... I would look at the code that is executed with the
Exec debug flag. By reading the code you should be able to step through and
figure out what's going on.

Jason

On Thu, Nov 17, 2016 at 4:12 PM mengyu liang <[email protected]> wrote:

> Dear all,
>
> Thanks a lot for all your explanation below. I'm now sticking to the
> classical Xbar memory system, not the ruby one. I accept the fact that the
> state transition or cache coherency takes zero time in this case.
>
> However today I studied the exec debug trace again for ALPHA FS simulation
> and found the following interesting entries:
>
>
> 3334580479000: system.switch_cpus02 T0 : 0x12000867c    : ldq
> r2,29968(r1)    : MemRead :  A=0x1200adda8
>
> ......
>
> 3334580495000: system.switch_cpus02 T0 : 0x12000867c    : ldq
> r2,29968(r1)    : MemRead :  D=0x00000001200adda8 A=0x1200adda8
>
>
> You see at the first entry cpu02 tries to read address from A=0x1200adda8
> but without data. Some time later at entry 2 I found the same core at the
> same instruction address is accessing the same data address with the same
> registers. But this time a valid data is returned D=0x00000001200adda8.
>
> Can I explain this as memory access request at the 1st entry, and data
> acknowledgement at the 2nd entry? Does it have something to do with Cache
> miss?
>
> If you compare this with the cache debug trace, you will find out that
> the 1st entry is not noted in cache trace. We have only notation in cache
> trace for 2nd entry.
>
> Then what happened at first entry?
>
> I would like to say, this kind of accesses take only a very small
> percentage of all memory accesses. Most memory accesses acquired the data
> already at their first entries.
>
>
> Also there are other kind of memory accesses in exec trace which have
> neither data address A=0x.... or returned data D=0x... example is below:
>
>
> 3334580433000: system.switch_cpus00 T0 : @iowrite8+36    :
> mb                         : MemRead :
>
>
> How to explain this?
>
>
> PS: I still don't know how to reply and hanging my post onto an existing
> topic in gem5 mailing list? instead of opening a new topic?
>
> Thanks in advance.
>
>
> Best regards,
> Mengyu
>
>
>
> ------------------------------
> *Von:* mengyu liang <[email protected]>
> *Gesendet:* Sonntag, 6. November 2016 22:03
> *An:* gem5 forum
> *Betreff:* Understanding of cache trace of ALPHA timing CPU
>
>
> Hello everyone,
>
>
> Recently I am studying the memory access time, (i.e. the duration of
> memory load and store) in term of CPU cycles in a multicore system. I come
> up with alpha timing CPU and have run several Full system simulation with
> Parsec workloads. In order to look into details of memory access procedure,
> I turned on the debug trace of Cache.
>
> However I am very disappointed to see that the entire memory access is
> treated "atomically". To illustrate my doubt, I paste the following Cache
> trace segment:
>
>
>
>
>
>
>
>
>
>
>
>
> *3587305218000: system.cpu3.dcache: ReadReq addr 0x6bcac8 size 8 (ns) miss
> 3587305218000: system.cpu3.dcache: createMissPacket created ReadSharedReq
> from ReadReq for  addr 0x6bcac0 size 32 3587305218000: system.cpu3.dcache:
> Sending an atomic ReadSharedReq for 0x6bcac0 (ns) 3587305218000:
> system.cpu0.dcache: handleSnoop snoop hit for CleanEvict addr 0x8601c0 size
> 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0
> tag: 10c03 3587305218000: system.cpu0.dcache: Found addr 0x8601c0 in upper
> level cache for snoop CleanEvict from lower cache 3587305218000:
> system.cpu1.dcache: handleSnoop snoop hit for CleanEvict addr 0x8601c0 size
> 32, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0
> tag: 10c03 3587305218000: system.cpu1.dcache: Found addr 0x8601c0 in upper
> level cache for snoop CleanEvict from lower cache 3587305218000:
> system.cpu3.dcache: Receive response: ReadResp for addr 0x6bcac0 (ns) in
> state 0 3587305218000: system.cpu3.dcache: replacement: replacing
> 0x3f0d0040 (ns) with 0x6bcac0 (ns): writeback 3587305218000:
> system.cpu3.dcache: Create Writeback 0x3f0d0040 writable: 1, dirty: 1
> 3587305218000: system.cpu3.dcache: Block addr 0x6bcac0 (ns) moving from
> state 0 to state: 7 (E) valid: 1 writable: 1 readable: 1 dirty: 0 tag: d795*
>
>
> As you can see above, cpu3 initiates a read request at the very beginning
> but encountered cache miss. So it has triggered a series of cache actions
> due to cache coherency. However they ALL take place at the same time tick,
> as if every memory access, no matter if it is cache miss or hit, takes ZERO
> time!
>
>
> As per the documentation of gem5, *The TimingSimpleCPU is the version of
> SimpleCPU that uses timing memory accesses. It stalls on cache accesses and
> waits for the memory system to respond prior to proceeding*. Based on
> that, I didn't expect an atomic-like behavior of timing CPU. It should have
> exhibited non-zero duration for each memory access.
>
>
> Does anybody have the same experience and can explain the reason for that?
>
>
> Or is there any CPU model which behaves non-atomically and can be
> implemented in multicore system? As far as I know, only O3 CPU does this
> job, however it's out of order. I need an in-order CPU.
>
>
> Thanks and best regards,
>
> Mengyu Liang
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Understanding of cache trace of ALPHA timing CPU

Reply via email to