huangyongbing <huangyongbing <at> ncic.ac.cn> writes:

> 
> 
> Hi all,
>  
>        I have to recall the accuracy problem of gem5. When running BBench 
to gem5 platform, I want to know whether somebody have compared the 
microarchitectural metrics such as L1 instruction cache miss measured from 
gem5 and real hardware board. And which parameters should I change in order 
to obtain similar results based on the arm_detailed CPU model. I have 
already tried to adjust several important configuration parameters of gem5, 
but still failed to get wanted results.
>        If the simulator results have big difference from the real 
platform, all the optimizations of CPU architecture based on the simulators 
would be useless. I found that many users are using gem5 simulator to 
simulate the ARM platform. Are there somebody meeting the same problem?
>  
>        Thanks.
>  
> Best regards,
> Yongbing Huang
>  
>  
> From: gem5-users-bounces <at> gem5.org [mailto:gem5-users-bounces <at> 
gem5.org] On Behalf Of huangyongbingSent: Monday, January 28, 2013 10:11 
AMTo: 'gem5 users mailing list'Subject: [SPAM] Re: [gem5-users] Mismatched 
stats between gem5 and performance counters when running BBench on ARM 
platform
>  
> Hi Orangeade,
>  
>          Thanks for your reply. I really have done some work to localize 
the problem.
>  
> 1)       I use arm_detailed mode in gem5. I also close the prefetcher on 
gem5, the same in real ARM platform.
> 2)       I have already change default 64B cache line into 32B cache line.
> 3)       I noticed about this. So I run a micro-benchmark just using CPU 
on ARM platform and gem5. The results seem the same as running bbench. I 
will check about this.
> 4)       In the real ARM platform, round robin cache replacement policy is 
used. But I use LRU replacement policy in gem5. I don’t know how much 
effects are caused by replacement policy.  I will implement round robin in 
gem5 and test again in the next step.
>  
> Thanks!
>  
> Best regards,
>  
> Yongbing Huang
>  
> From: gem5-users-bounces <at> gem5.org [mailto:gem5-users-bounces <at> 
gem5.org] On Behalf Of Mr. OrangeadeSent: Monday, January 28, 2013 3:07 
AMTo: gem5-users <at> gem5.orgSubject: Re: [gem5-users] Mismatched stats 
between gem5 and performance counters when running BBench on ARM platform
>  
> Hi Yongbing,I don't have any 100% solution for you but have a few 
questions which may help you to localize the problem:(1) Which type of model 
(functional or arm_detailed) do you run to collect the stats?    
Theoretically you should run 'arm_detailed' to take into account speculative 
misses.(2) 'arm_detailed' uses 64B cache line while Cortex-A9 has 32B cache 
line.    Did you take this into account (i.e. changed 'arm_detailed' cache 
line size to 32B)?(3) Not sure about Chromium but browsers in general may 
use GPU for compositing on real HW and execution path will be different 
comparing to SW-only BBench in gem5.OrangeadeYongbing wrote:Hi all,         
I recently compared the micro-architectural metrics such as L1cache miss 
collected by gem5 with that collected by performance counters onreal ARM 
platform. I found that their difference was so big. For example,the Icache 
miss rate per 1k instruction of bbench was about 30 collected byhardware 
performance counters (referring to the paper published by AnthonyGutierrez 
in IISWC'2011), but only about 3 for gem5. It's about 10xdifference.
> 
> 
> _______________________________________________
> gem5-users mailing list
> gem5-users <at> gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Hi,

Have you figured out where went wrong for this mismatch? I got the exact 
same problem. Here is the command line I run:

build/ARM/gem5.fast configs/example/fs.py -b bbench-gb --
kernel=vmlinux.smp.mouse.arm --cpu-type=arm_detailed --caches --l2cache --
frame-capture

And here are some of the statistics from stats.txt dumped by gem5:

system.cpu.commit.committedInsts          49093815561                       
# Number of instructions committed
system.cpu.itb.inst_misses                   16320412                       
# ITB inst misses
system.cpu.icache.overall_misses::total     164538082                       
# number of overall misses

The MPKI (misses per thousand instructions) of icache and itlb I got are 
3.35 and 0.33 respectively, compared against to 30 and 3 as shown in the 
paper published by Anthony Gutierrez in IISWC’2011. 

Altough I used single core, 32B as block size and did not disable 
prefetcher, I don't think it could make an order of magnitude difference.

Thank you!

Best regards,
Xiaowan




_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to