huangyongbing <huangyongbing <at> ncic.ac.cn> writes: > > > Hi all, > > I have to recall the accuracy problem of gem5. When running BBench to gem5 platform, I want to know whether somebody have compared the microarchitectural metrics such as L1 instruction cache miss measured from gem5 and real hardware board. And which parameters should I change in order to obtain similar results based on the arm_detailed CPU model. I have already tried to adjust several important configuration parameters of gem5, but still failed to get wanted results. > If the simulator results have big difference from the real platform, all the optimizations of CPU architecture based on the simulators would be useless. I found that many users are using gem5 simulator to simulate the ARM platform. Are there somebody meeting the same problem? > > Thanks. > > Best regards, > Yongbing Huang > > > From: gem5-users-bounces <at> gem5.org [mailto:gem5-users-bounces <at> gem5.org] On Behalf Of huangyongbingSent: Monday, January 28, 2013 10:11 AMTo: 'gem5 users mailing list'Subject: [SPAM] Re: [gem5-users] Mismatched stats between gem5 and performance counters when running BBench on ARM platform > > Hi Orangeade, > > Thanks for your reply. I really have done some work to localize the problem. > > 1) I use arm_detailed mode in gem5. I also close the prefetcher on gem5, the same in real ARM platform. > 2) I have already change default 64B cache line into 32B cache line. > 3) I noticed about this. So I run a micro-benchmark just using CPU on ARM platform and gem5. The results seem the same as running bbench. I will check about this. > 4) In the real ARM platform, round robin cache replacement policy is used. But I use LRU replacement policy in gem5. I don’t know how much effects are caused by replacement policy. I will implement round robin in gem5 and test again in the next step. > > Thanks! > > Best regards, > > Yongbing Huang > > From: gem5-users-bounces <at> gem5.org [mailto:gem5-users-bounces <at> gem5.org] On Behalf Of Mr. OrangeadeSent: Monday, January 28, 2013 3:07 AMTo: gem5-users <at> gem5.orgSubject: Re: [gem5-users] Mismatched stats between gem5 and performance counters when running BBench on ARM platform > > Hi Yongbing,I don't have any 100% solution for you but have a few questions which may help you to localize the problem:(1) Which type of model (functional or arm_detailed) do you run to collect the stats? Theoretically you should run 'arm_detailed' to take into account speculative misses.(2) 'arm_detailed' uses 64B cache line while Cortex-A9 has 32B cache line. Did you take this into account (i.e. changed 'arm_detailed' cache line size to 32B)?(3) Not sure about Chromium but browsers in general may use GPU for compositing on real HW and execution path will be different comparing to SW-only BBench in gem5.OrangeadeYongbing wrote:Hi all, I recently compared the micro-architectural metrics such as L1cache miss collected by gem5 with that collected by performance counters onreal ARM platform. I found that their difference was so big. For example,the Icache miss rate per 1k instruction of bbench was about 30 collected byhardware performance counters (referring to the paper published by AnthonyGutierrez in IISWC'2011), but only about 3 for gem5. It's about 10xdifference. > > > _______________________________________________ > gem5-users mailing list > gem5-users <at> gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
Hi, Have you figured out where went wrong for this mismatch? I got the exact same problem. Here is the command line I run: build/ARM/gem5.fast configs/example/fs.py -b bbench-gb -- kernel=vmlinux.smp.mouse.arm --cpu-type=arm_detailed --caches --l2cache -- frame-capture And here are some of the statistics from stats.txt dumped by gem5: system.cpu.commit.committedInsts 49093815561 # Number of instructions committed system.cpu.itb.inst_misses 16320412 # ITB inst misses system.cpu.icache.overall_misses::total 164538082 # number of overall misses The MPKI (misses per thousand instructions) of icache and itlb I got are 3.35 and 0.33 respectively, compared against to around 30 and 3 as shown in the paper published by Anthony Gutierrez in IISWC’2011. Altough I used single core, 64B as block size and did not disable prefetcher (default arm_detailed cpu), which is different from cortex A9, I don't think it could make an order of magnitude difference. Thank you! Best regards, Xiaowan _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
