Hey Abbas, Please always reply to the gem5 mailing list, and CC me when appropriate,
I can understand why you would like to have a fixed number. I think the stats can vary due to a very wide number of complex factors. Some of those may be more accurate, others no one knows, others just bugs. This can also be observed by the fact that the stats checks have been CHANGED by a long time, e.g.: https://www.mail-archive.com/[email protected]/msg26855.html changes happen so often that devs haven't found the time to properly understand and justify them. My recommendation is that you re-run your old experiments on the newer gem5 version, and compare everything there. gem5 is not a cycle accurate system simulator, so absolute values or small variations are not meaningful in general. This also teaches us that results obtained with small margins are generally not meaningful for publication since the noise is too great. What that error margin is, I don't know. On Tue, Sep 11, 2018 at 3:51 PM Abbas Fairouz <[email protected]> wrote: > > Hi Ciro, > > Thanks for your reply. > > The reason I was asking about the differences between these two versions of > GEM5, because I have published a paper two years ago using the old GEM5 > version. Now, I need two do more experiments on GEM5 using new memory > technologies (such as HBM). I'm getting different results in the new GEM5 > version, for the same benchmarks I used in the old GEM5 version. > > Is it because the new GEM5 has more accurate: > 1) Memory modeling? > 2) Cache modeling? > 3) CPU modeling? > > > > > Best regards, > Abbas Fairouz > > > ------------------------------------------------- > Abbas Fairouz, PhD candidate > Dept. of ECE, Texas A&M University > College Station, TX 77843, USA > ------------------------------------------------- > > > On Wed, Sep 5, 2018 at 4:52 AM, Ciro Santilli <[email protected]> wrote: >> >> Thanks for the detailed report, >> >> I recommend that if you really care about this difference, then do a >> bisection of gem5 and pinpoint which commit introduced it, and then tell us >> which one it was, possibly also pinging the author for clarification. >> >> If you are not familiar with bisection, here is a detailed example that you >> should be able to adapt easily for this use case: >> https://github.com/cirosantilli/linux-kernel-module-cheat/tree/83b36867cf06ffdca3ce04296a8568d4f37ea13b#bisection >> >> >> On Tue, Sep 4, 2018 at 8:50 PM Abbas Fairouz <[email protected]> wrote: >>> >>> Hi guys, >>> >>> I have simulated a simple "hello world" example on two different versions >>> of GEM5. I have got two different "system.cpu.numCycles" results in both >>> simulations. In both simulations, I have been using the same configurations >>> (linux image, vm, caches, ...etc). >>> >>> I will list the parts of the configuration files and "stats.txt" files for >>> both simulations. >>> >>> They have the same path to ~/gem5/system files. >>> I ran them on the same configuration: FS mode, O3 CPU, CPU speed is 2GHz, >>> DDR3_1600, l2 cache. >>> >>> >>> Running script is "test.rcS": >>> >>> /sbin/m5 resetstats >>> >>> echo "Start" >>> >>> echo `ls` >>> >>> cd test >>> >>> ./a.out >>> >>> echo "Bye" >>> >>> /sbin/m5 exit >>> >>> >>> >>> "a.out" is a binary code of "hello.c" file: >>> >>> #include <stdio.h> >>> >>> int main() >>> >>> { >>> >>> //printf() displays the string inside quotation >>> >>> printf("Hello, World!\n"); >>> >>> int x = 100 + 5 * 23 - 16 + 6 * 44 - 289 / 4; >>> >>> printf("X = %d\n", x); >>> >>> >>> return 0; >>> >>> } >>> >>> >>> >>> >>> >>> ========================================================== >>> >>> Old GEM5 (gem5-stable-0e86fac7254c) >>> >>> In "configs/common/FSConfig.py": >>> >>> # Command line >>> >>> self.boot_osflags = 'earlyprintk=ttyS0 console=ttyS0 lpj=7999923 ' + \ >>> >>> 'root=/dev/hda1' >>> >>> # abbas >>> >>> #self.kernel = binary('x86_64-vmlinux-2.6.22.9') >>> >>> self.kernel = binary('x86_64-vmlinux-2.6.22.9.smp') >>> >>> #self.kernel = binary('x86_64-vmlinux-2.6.28.4-smp') >>> >>> return self >>> >>> >>> >>> In "configs/common/Benchmarks.py": >>> >>> elif buildEnv['TARGET_ISA'] == 'x86': >>> >>> # abbas >>> >>> #return env.get('LINUX_IMAGE', disk('x86root.img')) >>> >>> return env.get('LINUX_IMAGE', disk('x86root-taco.img')) >>> >>> >>> >>> In "configs/common/Simulation.py": >>> >>> elif options.fast_forward: >>> >>> CPUClass = TmpClass >>> >>> # Abbas >>> >>> #TmpClass = AtomicSimpleCPU >>> >>> #test_mem_mode = 'atomic' >>> >>> TmpClass = TimingSimpleCPU >>> >>> test_mem_mode = 'timing' >>> >>> >>> >>> Running GEM5 command: >>> >>> ./build/X86/gem5.opt -d m5out/test ./configs/example/fs.py --caches >>> --l2cache --l1d_size=128kB --script=myscripts/test.rcS >>> --mem-type=DDR3_1600_x64 --restore-with-cpu=detailed >>> >>> >>> GEM5 terminal (tail): >>> >>> TCP cubic registered >>> >>> NET: Registered protocol family 1 >>> >>> NET: Registered protocol family 10 >>> >>> IPv6 over IPv4 tunneling driver >>> >>> NET: Registered protocol family 17 >>> >>> EXT2-fs warning: maximal mount count reached, running e2fsck is recommended >>> >>> VFS: Mounted root (ext2 filesystem). >>> >>> Freeing unused kernel memory: 248k freed >>> >>> mounting filesystems... >>> >>> loading script... >>> >>> Start >>> >>> benches bin boot dev etc home lib lib32 lib64 linuxrc lost+found mnt normal >>> opt parsec proc real root sbin sys test tmp usr var >>> >>> Hello >>> >>> X = 391 >>> >>> Bye >>> >>> >>> >>> >>> In "stats.txt" file: >>> >>> system.cpu.apic_clk_domain.clock 8000 >>> # Clock period in ticks >>> >>> system.cpu.numCycles 4273712 >>> # number of cpu cycles simulated >>> >>> system.cpu.numWorkItemsStarted 0 >>> # number of work items this cpu started >>> >>> system.cpu.numWorkItemsCompleted 0 >>> # number of work items this cpu completed >>> >>> system.cpu.committedInsts 1954222 >>> # Number of instructions committed >>> >>> system.cpu.committedOps 3584009 >>> # Number of ops (including micro ops) committed >>> >>> system.cpu.num_int_alu_accesses 3508387 >>> # Number of integer alu accesses >>> >>> system.cpu.num_fp_alu_accesses 21132 >>> # Number of float alu accesses >>> >>> system.cpu.num_func_calls 85033 >>> # number of times a function call or return occured >>> >>> system.cpu.num_conditional_control_insts 254623 >>> # number of instructions that are conditional controls >>> >>> system.cpu.num_int_insts 3508387 >>> # number of integer instructions >>> >>> system.cpu.num_fp_insts 21132 >>> # number of float instructions >>> >>> system.cpu.num_int_register_reads 7285240 >>> # number of times the integer registers were read >>> >>> system.cpu.num_int_register_writes 2775300 >>> # number of times the integer registers were written >>> >>> system.cpu.num_fp_register_reads 35511 >>> # number of times the floating registers were read >>> >>> system.cpu.num_fp_register_writes 16891 >>> # number of times the floating registers were written >>> >>> system.cpu.num_cc_register_reads 1862494 >>> # number of times the CC registers were read >>> >>> system.cpu.num_cc_register_writes 1160708 >>> # number of times the CC registers were written >>> >>> system.cpu.num_mem_refs 885650 >>> # number of memory refs >>> >>> system.cpu.num_load_insts 499134 >>> # Number of load instructions >>> >>> system.cpu.num_store_insts 386516 >>> # Number of store instructions >>> >>> system.cpu.num_idle_cycles 109958.492414 >>> # Number of idle cycles >>> >>> system.cpu.num_busy_cycles 4163753.507586 >>> # Number of busy cycles >>> >>> system.cpu.not_idle_fraction 0.974271 >>> # Percentage of non-idle cycles >>> >>> system.cpu.idle_fraction 0.025729 >>> # Percentage of idle cycles >>> >>> system.cpu.Branches 374315 >>> # Number of branches fetched >>> >>> system.cpu.op_class::No_OpClass 22624 0.63% 0.63% >>> # Class of executed instruction >>> >>> system.cpu.op_class::IntAlu 2647876 73.88% 74.51% >>> # Class of executed instruction >>> >>> system.cpu.op_class::IntMult 6228 0.17% 74.68% >>> # Class of executed instruction >>> >>> system.cpu.op_class::IntDiv 3691 0.10% 74.78% >>> # Class of executed instruction >>> >>> system.cpu.op_class::FloatAdd 18119 0.51% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::FloatCmp 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::FloatCvt 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::FloatMult 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::FloatDiv 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::FloatSqrt 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdAdd 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdAddAcc 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdAlu 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdCmp 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdCvt 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdMisc 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> system.cpu.op_class::SimdMult 0 0.00% 75.29% >>> # Class of executed instruction >>> >>> >>> >>> >>> >>> >>> New GEM5 >>> >>> In "configs/common/FSConfig.py": >>> >>> # Command line >>> >>> if not cmdline: >>> >>> cmdline = 'earlyprintk=ttyS0 console=ttyS0 lpj=7999923 >>> root=/dev/hda1' >>> >>> self.boot_osflags = fillInCmdline(mdesc, cmdline) >>> >>> # abbas >>> >>> #self.kernel = binary('x86_64-vmlinux-2.6.22.9') >>> >>> self.kernel = binary('x86_64-vmlinux-2.6.22.9.smp') >>> >>> #self.kernel = binary('x86_64-vmlinux-2.6.28.4-smp') >>> >>> return self >>> >>> >>> >>> In "configs/common/Benchmarks.py": >>> >>> elif buildEnv['TARGET_ISA'] == 'x86': >>> >>> # abbas >>> >>> #return env.get('LINUX_IMAGE', disk('x86root.img')) >>> >>> #return env.get('LINUX_IMAGE', disk('linux-x86.img')) >>> >>> return env.get('LINUX_IMAGE', disk('x86root-taco.img')) >>> >>> >>> >>> In "configs/common/Simulation.py": >>> >>> elif options.fast_forward: >>> >>> CPUClass = TmpClass >>> >>> # Abbas >>> >>> #TmpClass = AtomicSimpleCPU >>> >>> #test_mem_mode = 'atomic' >>> >>> TmpClass = TimingSimpleCPU >>> >>> test_mem_mode = 'timing' >>> >>> >>> >>> Running GEM5 command: >>> >>> ./build/X86/gem5.opt -d m5out/test ./configs/example/fs.py --caches >>> --l2cache --l1d_size=128kB --script=myscripts/test.rcS >>> --mem-type=DDR3_1600_8x8 --restore-with-cpu=DerivO3CPU >>> >>> >>> >>> >>> GEM5 terminal (tail): >>> >>> TCP cubic registered >>> >>> NET: Registered protocol family 1 >>> >>> NET: Registered protocol family 10 >>> >>> IPv6 over IPv4 tunneling driver >>> >>> NET: Registered protocol family 17 >>> >>> EXT2-fs warning: maximal mount count reached, running e2fsck is recommended >>> >>> VFS: Mounted root (ext2 filesystem). >>> >>> Freeing unused kernel memory: 248k freed >>> >>> mounting filesystems... >>> >>> loading script... >>> >>> Start >>> >>> benches bin boot dev etc home lib lib32 lib64 linuxrc lost+found mnt normal >>> opt parsec proc real root sbin sys test tmp usr var >>> >>> Hello >>> >>> X = 391 >>> >>> Bye >>> >>> >>> >>> >>> In "stats.txt" file: >>> >>> system.cpu_voltage_domain.voltage 1 >>> # Voltage in Volts >>> >>> system.cpu_clk_domain.clock 500 >>> # Clock period in ticks >>> >>> system.cpu.dtb.walker.pwrStateResidencyTicks::UNDEFINED 5141035093500 >>> # Cumulative time (in ticks) in various power states >>> >>> system.cpu.dtb.rdAccesses 497427 >>> # TLB accesses on read requests >>> >>> system.cpu.dtb.wrAccesses 384596 >>> # TLB accesses on write requests >>> >>> system.cpu.dtb.rdMisses 434 >>> # TLB misses on read requests >>> >>> system.cpu.dtb.wrMisses 163 >>> # TLB misses on write requests >>> >>> system.cpu.apic_clk_domain.clock 8000 >>> # Clock period in ticks >>> >>> system.cpu.interrupts.pwrStateResidencyTicks::UNDEFINED 5141035093500 >>> # Cumulative time (in ticks) in various power states >>> >>> system.cpu.itb.walker.pwrStateResidencyTicks::UNDEFINED 5141035093500 >>> # Cumulative time (in ticks) in various power states >>> >>> system.cpu.itb.rdAccesses 0 >>> # TLB accesses on read requests >>> >>> system.cpu.itb.wrAccesses 2532817 >>> # TLB accesses on write requests >>> >>> system.cpu.itb.rdMisses 0 >>> # TLB misses on read requests >>> >>> system.cpu.itb.wrMisses 640 >>> # TLB misses on write requests >>> >>> system.cpu.numPwrStateTransitions 64 >>> # Number of power state transitions >>> >>> system.cpu.pwrStateClkGateDist::samples 32 >>> # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateClkGateDist::mean 1344463.875000 >>> # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateClkGateDist::stdev 1757712.048093 >>> # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateClkGateDist::1000-5e+10 32 100.00% >>> 100.00% # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateClkGateDist::min_value 219525 >>> # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateClkGateDist::max_value 4847757 >>> # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateClkGateDist::total 32 >>> # Distribution of time spent in the clock gated state >>> >>> system.cpu.pwrStateResidencyTicks::ON 2768793027 >>> # Cumulative time (in ticks) in various power states >>> >>> system.cpu.pwrStateResidencyTicks::CLK_GATED 43022844 >>> # Cumulative time (in ticks) in various power states >>> >>> system.cpu.numCycles 4233161 >>> # number of cpu cycles simulated >>> >>> system.cpu.numWorkItemsStarted 0 >>> # number of work items this cpu started >>> >>> system.cpu.numWorkItemsCompleted 0 >>> # number of work items this cpu completed >>> >>> system.cpu.kern.inst.arm 0 >>> # number of arm instructions executed >>> >>> system.cpu.kern.inst.quiesce 0 >>> # number of quiesce instructions executed >>> >>> system.cpu.committedInsts 1956251 >>> # Number of instructions committed >>> >>> system.cpu.committedOps 3569940 >>> # Number of ops (including micro ops) committed >>> >>> system.cpu.num_int_alu_accesses 3492413 >>> # Number of integer alu accesses >>> >>> system.cpu.num_fp_alu_accesses 21132 >>> # Number of float alu accesses >>> >>> system.cpu.num_vec_alu_accesses 0 >>> # Number of vector alu accesses >>> >>> system.cpu.num_func_calls 84965 >>> # number of times a function call or return >>> >>> >>> >>> >>> ========================================================== >>> >>> >>> >>> Can anyone explains to me why both simulations does not have the same >>> number of cycles? >>> >>> Old GEM5: system.cpu.numCycles 4273712 >>> New GEM5: system.cpu.numCycles 4233161 >>> >>> >>> Best regards, >>> Abbas Fairouz >>> >>> >>> ------------------------------------------------- >>> Abbas Fairouz, PhD candidate >>> Dept. of ECE, Texas A&M University >>> College Station, TX 77843, USA >>> ------------------------------------------------- >>> >>> _______________________________________________ >>> gem5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
