Hi all,

 

       I have to recall the accuracy problem of gem5. When running BBench to
gem5 platform, I want to know whether somebody have compared the
microarchitectural metrics such as L1 instruction cache miss measured from
gem5 and real hardware board. And which parameters should I change in order
to obtain similar results based on the arm_detailed CPU model. I have
already tried to adjust several important configuration parameters of gem5,
but still failed to get wanted results.

       If the simulator results have big difference from the real platform,
all the optimizations of CPU architecture based on the simulators would be
useless. I found that many users are using gem5 simulator to simulate the
ARM platform. Are there somebody meeting the same problem?

 

       Thanks.

 

Best regards,

Yongbing Huang

 

 

From: [email protected] [mailto:[email protected]] On
Behalf Of huangyongbing
Sent: Monday, January 28, 2013 10:11 AM
To: 'gem5 users mailing list'
Subject: [SPAM] Re: [gem5-users] Mismatched stats between gem5 and
performance counters when running BBench on ARM platform

 

Hi Orangeade,

 

         Thanks for your reply. I really have done some work to localize the
problem.

 

1)       I use arm_detailed mode in gem5. I also close the prefetcher on
gem5, the same in real ARM platform.

2)       I have already change default 64B cache line into 32B cache line.

3)       I noticed about this. So I run a micro-benchmark just using CPU on
ARM platform and gem5. The results seem the same as running bbench. I will
check about this.

4)       In the real ARM platform, round robin cache replacement policy is
used. But I use LRU replacement policy in gem5. I don't know how much
effects are caused by replacement policy.  I will implement round robin in
gem5 and test again in the next step.

 

Thanks!

 

Best regards,

 

Yongbing Huang

 

From: [email protected] [mailto:[email protected]] On
Behalf Of Mr. Orangeade
Sent: Monday, January 28, 2013 3:07 AM
To: [email protected]
Subject: Re: [gem5-users] Mismatched stats between gem5 and performance
counters when running BBench on ARM platform

 


Hi Yongbing,

I don't have any 100% solution for you but have a few questions which may
help you to localize the problem:

(1) Which type of model (functional or arm_detailed) do you run to collect
the stats?
    Theoretically you should run 'arm_detailed' to take into account
speculative misses.

(2) 'arm_detailed' uses 64B cache line while Cortex-A9 has 32B cache line.
    Did you take this into account (i.e. changed 'arm_detailed' cache line
size to 32B)?

(3) Not sure about Chromium but browsers in general may use GPU for
compositing on real HW and execution path will be different comparing to
SW-only BBench in gem5.

Orangeade

Yongbing wrote:

Hi all,

         I recently compared the micro-architectural metrics such as L1
cache miss collected by gem5 with that collected by performance counters on
real ARM platform. I found that their difference was so big. For example,
the Icache miss rate per 1k instruction of bbench was about 30 collected by
hardware performance counters (referring to the paper published by Anthony
Gutierrez in IISWC'2011), but only about 3 for gem5. It's about 10x
difference.
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to