Depending on your ISA, instructions that miss in the TLB may be counted twice (if it's a trap/SW handler/restart mechanism, like Alpha), and we use TLBs even in SE mode to map the virtual space into a contiguous physical space. So if you're using Alpha the first thing I'd check is whether the instruction count discrepancy matches the number of TLB misses.
Other than that though, I agree, it's puzzling, but tracediff will tell you the answer. Steve On Thu, Jun 2, 2011 at 6:31 AM, Ali Saidi <sa...@umich.edu> wrote: > Yes, it is. The only way to see what is going on is to use tracediff and see > where the execution diverges. > Ali > On Jun 2, 2011, at 6:48 AM, Gustavo Henrique Nihei wrote: > > First, sorry for bringing back an old thread. > But I'm still confused but this matter. I'm not running FS. > So, by running an SE platform, isn't it weird that for a single application, > and different cache configurations, the number of simulated instructions > differ between simulations? > I mean, if there's no underlying OS, just the application, the expected > would be the CPU to only execute the instructions provided by the app > binary, or am I missing some point here? > Thanks. > On Tue, Jan 25, 2011 at 12:46 PM, Steve Reinhardt <ste...@gmail.com> wrote: >> >> Yes, it's almost impossible to get completely identical behavior without >> running a completely identical system. Even making the cache larger will >> make the program run faster in some phases, which will change where timer >> interrupts happen with respect to program execution. >> If you look at larger time windows and/or more samples, the mean behavior >> should stabilize, but trying to correlate individual small samples like >> you're doing is going to be extremely challenging. >> This paper focuses on these issues in multiprocessor systems, but most of >> what it talks about is relevant to uniprocessor systems running a full OS >> too: >> http://pages.cs.wisc.edu/~alaa/papers/ieeemicro03_variability.pdf >> Steve >> >> On Mon, Jan 24, 2011 at 10:34 PM, Stevenson Jian <stevensonj...@gmail.com> >> wrote: >>> >>> Yes, I am running in FS mode. Is it normal for the OS to make that much >>> difference? >>> These statistics are taken after the benchmarks have started. >>> Thanks! >>> Steve >>> On Tue, Jan 25, 2011 at 12:00 AM, Steve Reinhardt <ste...@gmail.com> >>> wrote: >>>> >>>> OK, sorry for the confusion; since you were running a Parsec benchmark I >>>> assumed the numbers were processor IDs. Are you running in FS mode? Are >>>> these statistics taken from the beginning when Linux is booting, or are >>>> they >>>> after the benchmark has started running? >>>> Steve >>>> >>>> On Mon, Jan 24, 2011 at 5:51 PM, Stevenson Jian >>>> <stevensonj...@gmail.com> wrote: >>>>> >>>>> Thanks for replying Steve. I only used a single processor in both >>>>> simulations. What is shown is not the output from individual processors, >>>>> but >>>>> that of the same processor at the end of every 100,000 instructions (see >>>>> sim_insts increment 100,000 each time) >>>>> >>>>> On Mon, Jan 24, 2011 at 7:14 PM, Steve Reinhardt <ste...@gmail.com> >>>>> wrote: >>>>>> >>>>>> With a multiprocessor, seemingly small changes in configuration can >>>>>> have a significant impact if it changes the order in which threads grab a >>>>>> lock, or something like that. So in particular, for the stats you have >>>>>> below, it seems likely that there's some serialized computation going on >>>>>> that happened on processor 3 in the first case and on processor 5 in the >>>>>> second case. >>>>>> Steve >>>>>> >>>>>> On Mon, Jan 24, 2011 at 1:30 PM, Stevenson Jian >>>>>> <stevensonj...@gmail.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> How does Timing CPU count number of instructions? If it stalls on a >>>>>>> cache miss, do the Nops count as instructions as well? The reason why I >>>>>>> ask >>>>>>> is that by simply changing the size of the cache, the total number of >>>>>>> instructions when the benchmark completes varies by about 0.1 - 0.01%. >>>>>>> Another anomaly that I am observing is that again, by simply changing >>>>>>> the size of the L2, the number of overall L2 accesses per let's say >>>>>>> 100,000 >>>>>>> instructions can vary by over 100%. >>>>>>> The following are 2 runs that i did on m5 with the Freqmine >>>>>>> benchmark. The first simulation uses a 1Mb 4 way L2 with a latency of >>>>>>> 6ns >>>>>>> while the second simulation uses a 2MB 8 way L2 with a latency of >>>>>>> 4.5ns. The >>>>>>> overall access per 100,000 instructions are show. >>>>>>> >>>>>>> --------------------------------------------------------------------------------------------- >>>>>>> 1MB 4Way L2: >>>>>>> 2: >>>>>>> sim_insts 100200001 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 196940000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 3231 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 2515 >>>>>>> # number of overall hits >>>>>>> 3: >>>>>>> sim_insts 100300001 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 227453000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 4656 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 3434 >>>>>>> # number of overall hits >>>>>>> 4: >>>>>>> sim_insts 100400001 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 154064000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 1078 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 722 >>>>>>> # number of overall hits >>>>>>> 5: >>>>>>> sim_insts 100500001 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 155779000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 1575 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 1154 >>>>>>> # number of overall hits >>>>>>> .... >>>>>>> 2MB 8Way L2: >>>>>>> 2: >>>>>>> sim_insts 100200001 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 234810000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 2936 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 1163 >>>>>>> # number of overall hits >>>>>>> 3: >>>>>>> sim_insts 100300000 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 174173000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 1496 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 803 >>>>>>> # number of overall hits >>>>>>> 4: >>>>>>> sim_insts 100400000 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 190135000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 2290 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 1672 >>>>>>> # number of overall hits >>>>>>> 5: >>>>>>> sim_insts 100500000 >>>>>>> # Number of instructions simulated >>>>>>> sim_ticks 213086000 >>>>>>> # Number of ticks simulated >>>>>>> system.l2.overall_accesses 4554 >>>>>>> # number of overall (read+write) accesses >>>>>>> system.l2.overall_hits 3871 >>>>>>> # number of overall hits >>>>>>> ..... >>>>>>> >>>>>>> ---------------------------------------------------------------------------- >>>>>>> Even if Nops are counted as instructions, I don't see how that would >>>>>>> make overall access/100,000 instructions vary by as much 200%. How does >>>>>>> M5 >>>>>>> count the number of instructions? >>>>>>> Thanks, >>>>>>> Steve >>>>>>> _______________________________________________ >>>>>>> m5-users mailing list >>>>>>> m5-us...@m5sim.org >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> m5-users mailing list >>>>>> m5-us...@m5sim.org >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-users mailing list >>>>> m5-us...@m5sim.org >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>> >>>> >>>> _______________________________________________ >>>> m5-users mailing list >>>> m5-us...@m5sim.org >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >>> >>> _______________________________________________ >>> m5-users mailing list >>> m5-us...@m5sim.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >> >> _______________________________________________ >> m5-users mailing list >> m5-us...@m5sim.org >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > -- > Gustavo Henrique Nihei > LAPS - Laboratório de Automação do Projeto de Sistemas > NIME - Núcleo Interdepartamental de Microeletrônica > Universidade Federal de Santa Catarina > Florianópolis - Santa Catarina - Brasil > _______________________________________________ > gem5-users mailing list > gem5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ > gem5-users mailing list > gem5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > _______________________________________________ gem5-users mailing list gem5-users@m5sim.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users