Ah, if you're running a multithreaded program, then yes, it's not at all surprising that the instruction counts change. Not many people run multithreaded programs under SE mode, which is why I was confused.
It's really only single-threaded SE-mode programs that should have repeatable instruction counts across different configurations. Steve On Sat, Jun 4, 2011 at 6:30 AM, Gustavo Henrique Nihei <ghni...@gmail.com> wrote: > Thanks! I'll take some time to analyze the execution traces. > In fact, I'm using the Sparc ISA, but I suppose it works in similar way as > the Alpha. > One thing I forgot to mention, I'm using the m5threads lib. > I think the discrepancy might be caused by the threads syncing mechanism. > As soon as I discover the cause, I'll report here. > On Thu, Jun 2, 2011 at 11:04 AM, Steve Reinhardt <ste...@gmail.com> wrote: >> >> Depending on your ISA, instructions that miss in the TLB may be >> counted twice (if it's a trap/SW handler/restart mechanism, like >> Alpha), and we use TLBs even in SE mode to map the virtual space into >> a contiguous physical space. So if you're using Alpha the first thing >> I'd check is whether the instruction count discrepancy matches the >> number of TLB misses. >> >> Other than that though, I agree, it's puzzling, but tracediff will >> tell you the answer. >> >> Steve >> >> On Thu, Jun 2, 2011 at 6:31 AM, Ali Saidi <sa...@umich.edu> wrote: >> > Yes, it is. The only way to see what is going on is to use tracediff and >> > see >> > where the execution diverges. >> > Ali >> > On Jun 2, 2011, at 6:48 AM, Gustavo Henrique Nihei wrote: >> > >> > First, sorry for bringing back an old thread. >> > But I'm still confused but this matter. I'm not running FS. >> > So, by running an SE platform, isn't it weird that for a single >> > application, >> > and different cache configurations, the number of simulated instructions >> > differ between simulations? >> > I mean, if there's no underlying OS, just the application, the expected >> > would be the CPU to only execute the instructions provided by the app >> > binary, or am I missing some point here? >> > Thanks. >> > On Tue, Jan 25, 2011 at 12:46 PM, Steve Reinhardt <ste...@gmail.com> >> > wrote: >> >> >> >> Yes, it's almost impossible to get completely identical behavior >> >> without >> >> running a completely identical system. Even making the cache larger >> >> will >> >> make the program run faster in some phases, which will change where >> >> timer >> >> interrupts happen with respect to program execution. >> >> If you look at larger time windows and/or more samples, the mean >> >> behavior >> >> should stabilize, but trying to correlate individual small samples like >> >> you're doing is going to be extremely challenging. >> >> This paper focuses on these issues in multiprocessor systems, but most >> >> of >> >> what it talks about is relevant to uniprocessor systems running a full >> >> OS >> >> too: >> >> http://pages.cs.wisc.edu/~alaa/papers/ieeemicro03_variability.pdf >> >> Steve >> >> >> >> On Mon, Jan 24, 2011 at 10:34 PM, Stevenson Jian >> >> <stevensonj...@gmail.com> >> >> wrote: >> >>> >> >>> Yes, I am running in FS mode. Is it normal for the OS to make that >> >>> much >> >>> difference? >> >>> These statistics are taken after the benchmarks have started. >> >>> Thanks! >> >>> Steve >> >>> On Tue, Jan 25, 2011 at 12:00 AM, Steve Reinhardt <ste...@gmail.com> >> >>> wrote: >> >>>> >> >>>> OK, sorry for the confusion; since you were running a Parsec >> >>>> benchmark I >> >>>> assumed the numbers were processor IDs. Are you running in FS mode? >> >>>> Are >> >>>> these statistics taken from the beginning when Linux is booting, or >> >>>> are they >> >>>> after the benchmark has started running? >> >>>> Steve >> >>>> >> >>>> On Mon, Jan 24, 2011 at 5:51 PM, Stevenson Jian >> >>>> <stevensonj...@gmail.com> wrote: >> >>>>> >> >>>>> Thanks for replying Steve. I only used a single processor in both >> >>>>> simulations. What is shown is not the output from individual >> >>>>> processors, but >> >>>>> that of the same processor at the end of every 100,000 instructions >> >>>>> (see >> >>>>> sim_insts increment 100,000 each time) >> >>>>> >> >>>>> On Mon, Jan 24, 2011 at 7:14 PM, Steve Reinhardt <ste...@gmail.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> With a multiprocessor, seemingly small changes in configuration can >> >>>>>> have a significant impact if it changes the order in which threads >> >>>>>> grab a >> >>>>>> lock, or something like that. So in particular, for the stats you >> >>>>>> have >> >>>>>> below, it seems likely that there's some serialized computation >> >>>>>> going on >> >>>>>> that happened on processor 3 in the first case and on processor 5 >> >>>>>> in the >> >>>>>> second case. >> >>>>>> Steve >> >>>>>> >> >>>>>> On Mon, Jan 24, 2011 at 1:30 PM, Stevenson Jian >> >>>>>> <stevensonj...@gmail.com> wrote: >> >>>>>>> >> >>>>>>> Hi, >> >>>>>>> How does Timing CPU count number of instructions? If it stalls on >> >>>>>>> a >> >>>>>>> cache miss, do the Nops count as instructions as well? The reason >> >>>>>>> why I ask >> >>>>>>> is that by simply changing the size of the cache, the total number >> >>>>>>> of >> >>>>>>> instructions when the benchmark completes varies by about 0.1 - >> >>>>>>> 0.01%. >> >>>>>>> Another anomaly that I am observing is that again, by simply >> >>>>>>> changing >> >>>>>>> the size of the L2, the number of overall L2 accesses per let's >> >>>>>>> say 100,000 >> >>>>>>> instructions can vary by over 100%. >> >>>>>>> The following are 2 runs that i did on m5 with the Freqmine >> >>>>>>> benchmark. The first simulation uses a 1Mb 4 way L2 with a latency >> >>>>>>> of 6ns >> >>>>>>> while the second simulation uses a 2MB 8 way L2 with a latency of >> >>>>>>> 4.5ns. The >> >>>>>>> overall access per 100,000 instructions are show. >> >>>>>>> >> >>>>>>> >> >>>>>>> --------------------------------------------------------------------------------------------- >> >>>>>>> 1MB 4Way L2: >> >>>>>>> 2: >> >>>>>>> sim_insts 100200001 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 196940000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 3231 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 2515 >> >>>>>>> # number of overall hits >> >>>>>>> 3: >> >>>>>>> sim_insts 100300001 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 227453000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 4656 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 3434 >> >>>>>>> # number of overall hits >> >>>>>>> 4: >> >>>>>>> sim_insts 100400001 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 154064000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 1078 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 722 >> >>>>>>> # number of overall hits >> >>>>>>> 5: >> >>>>>>> sim_insts 100500001 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 155779000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 1575 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 1154 >> >>>>>>> # number of overall hits >> >>>>>>> .... >> >>>>>>> 2MB 8Way L2: >> >>>>>>> 2: >> >>>>>>> sim_insts 100200001 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 234810000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 2936 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 1163 >> >>>>>>> # number of overall hits >> >>>>>>> 3: >> >>>>>>> sim_insts 100300000 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 174173000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 1496 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 803 >> >>>>>>> # number of overall hits >> >>>>>>> 4: >> >>>>>>> sim_insts 100400000 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 190135000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 2290 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 1672 >> >>>>>>> # number of overall hits >> >>>>>>> 5: >> >>>>>>> sim_insts 100500000 >> >>>>>>> # Number of instructions simulated >> >>>>>>> sim_ticks 213086000 >> >>>>>>> # Number of ticks simulated >> >>>>>>> system.l2.overall_accesses 4554 >> >>>>>>> # number of overall (read+write) accesses >> >>>>>>> system.l2.overall_hits 3871 >> >>>>>>> # number of overall hits >> >>>>>>> ..... >> >>>>>>> >> >>>>>>> >> >>>>>>> ---------------------------------------------------------------------------- >> >>>>>>> Even if Nops are counted as instructions, I don't see how that >> >>>>>>> would >> >>>>>>> make overall access/100,000 instructions vary by as much 200%. How >> >>>>>>> does M5 >> >>>>>>> count the number of instructions? >> >>>>>>> Thanks, >> >>>>>>> Steve >> >>>>>>> _______________________________________________ >> >>>>>>> m5-users mailing list >> >>>>>>> m5-us...@m5sim.org >> >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >>>>>> >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> m5-users mailing list >> >>>>>> m5-us...@m5sim.org >> >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> m5-users mailing list >> >>>>> m5-us...@m5sim.org >> >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> m5-users mailing list >> >>>> m5-us...@m5sim.org >> >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >>> >> >>> >> >>> _______________________________________________ >> >>> m5-users mailing list >> >>> m5-us...@m5sim.org >> >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >> >> >> >> >> _______________________________________________ >> >> m5-users mailing list >> >> m5-us...@m5sim.org >> >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> > >> > >> > >> > -- >> > Gustavo Henrique Nihei >> > LAPS - Laboratório de Automação do Projeto de Sistemas >> > NIME - Núcleo Interdepartamental de Microeletrônica >> > Universidade Federal de Santa Catarina >> > Florianópolis - Santa Catarina - Brasil >> > _______________________________________________ >> > gem5-users mailing list >> > gem5-users@m5sim.org >> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > >> > _______________________________________________ >> > gem5-users mailing list >> > gem5-users@m5sim.org >> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > >> _______________________________________________ >> gem5-users mailing list >> gem5-users@m5sim.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > -- > Gustavo Henrique Nihei > LAPS - Laboratório de Automação do Projeto de Sistemas > NIME - Núcleo Interdepartamental de Microeletrônica > Universidade Federal de Santa Catarina > Florianópolis - Santa Catarina - Brasil > > _______________________________________________ > gem5-users mailing list > gem5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > _______________________________________________ gem5-users mailing list gem5-users@m5sim.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users