First, sorry for bringing back an old thread. But I'm still confused but this matter. I'm not running FS. So, by running an SE platform, isn't it weird that for a single application, and different cache configurations, the number of simulated instructions differ between simulations? I mean, if there's no underlying OS, just the application, the expected would be the CPU to only execute the instructions provided by the app binary, or am I missing some point here?
Thanks. On Tue, Jan 25, 2011 at 12:46 PM, Steve Reinhardt <ste...@gmail.com> wrote: > Yes, it's almost impossible to get completely identical behavior without > running a completely identical system. Even making the cache larger will > make the program run faster in some phases, which will change where timer > interrupts happen with respect to program execution. > > If you look at larger time windows and/or more samples, the mean behavior > should stabilize, but trying to correlate individual small samples like > you're doing is going to be extremely challenging. > > This paper focuses on these issues in multiprocessor systems, but most of > what it talks about is relevant to uniprocessor systems running a full OS > too: > http://pages.cs.wisc.edu/~alaa/papers/ieeemicro03_variability.pdf > > <http://pages.cs.wisc.edu/~alaa/papers/ieeemicro03_variability.pdf>Steve > > > On Mon, Jan 24, 2011 at 10:34 PM, Stevenson Jian > <stevensonj...@gmail.com>wrote: > >> Yes, I am running in FS mode. Is it normal for the OS to make that much >> difference? >> These statistics are taken after the benchmarks have started. >> Thanks! >> Steve >> >> On Tue, Jan 25, 2011 at 12:00 AM, Steve Reinhardt <ste...@gmail.com>wrote: >> >>> OK, sorry for the confusion; since you were running a Parsec benchmark I >>> assumed the numbers were processor IDs. Are you running in FS mode? Are >>> these statistics taken from the beginning when Linux is booting, or are they >>> after the benchmark has started running? >>> >>> Steve >>> >>> >>> On Mon, Jan 24, 2011 at 5:51 PM, Stevenson Jian <stevensonj...@gmail.com >>> > wrote: >>> >>>> Thanks for replying Steve. I only used a single processor in both >>>> simulations. What is shown is not the output from individual processors, >>>> but >>>> that of the same processor at the end of every 100,000 instructions (see >>>> sim_insts increment 100,000 each time) >>>> >>>> >>>> On Mon, Jan 24, 2011 at 7:14 PM, Steve Reinhardt <ste...@gmail.com>wrote: >>>> >>>>> With a multiprocessor, seemingly small changes in configuration can >>>>> have a significant impact if it changes the order in which threads grab a >>>>> lock, or something like that. So in particular, for the stats you have >>>>> below, it seems likely that there's some serialized computation going on >>>>> that happened on processor 3 in the first case and on processor 5 in the >>>>> second case. >>>>> >>>>> Steve >>>>> >>>>> On Mon, Jan 24, 2011 at 1:30 PM, Stevenson Jian < >>>>> stevensonj...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> How does Timing CPU count number of instructions? If it stalls on a >>>>>> cache miss, do the Nops count as instructions as well? The reason why I >>>>>> ask >>>>>> is that by simply changing the size of the cache, the total number of >>>>>> instructions when the benchmark completes varies by about 0.1 - 0.01%. >>>>>> >>>>>> Another anomaly that I am observing is that again, by simply changing >>>>>> the size of the L2, the number of overall L2 accesses per let's say >>>>>> 100,000 >>>>>> instructions can vary by over 100%. >>>>>> >>>>>> The following are 2 runs that i did on m5 with the Freqmine benchmark. >>>>>> The first simulation uses a 1Mb 4 way L2 with a latency of 6ns while the >>>>>> second simulation uses a 2MB 8 way L2 with a latency of 4.5ns. The >>>>>> overall >>>>>> access per 100,000 instructions are show. >>>>>> >>>>>> --------------------------------------------------------------------------------------------- >>>>>> 1MB 4Way L2: >>>>>> 2: >>>>>> sim_insts 100200001 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 196940000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 3231 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 2515 >>>>>> # number of overall hits >>>>>> >>>>>> 3: >>>>>> sim_insts 100300001 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 227453000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 4656 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 3434 >>>>>> # number of overall hits >>>>>> >>>>>> 4: >>>>>> sim_insts 100400001 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 154064000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 1078 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 722 >>>>>> # number of overall hits >>>>>> >>>>>> 5: >>>>>> sim_insts 100500001 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 155779000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 1575 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 1154 >>>>>> # number of overall hits >>>>>> >>>>>> .... >>>>>> >>>>>> 2MB 8Way L2: >>>>>> 2: >>>>>> sim_insts 100200001 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 234810000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 2936 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 1163 >>>>>> # number of overall hits >>>>>> >>>>>> 3: >>>>>> sim_insts 100300000 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 174173000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 1496 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 803 >>>>>> # number of overall hits >>>>>> >>>>>> 4: >>>>>> sim_insts 100400000 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 190135000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 2290 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 1672 >>>>>> # number of overall hits >>>>>> >>>>>> 5: >>>>>> sim_insts 100500000 >>>>>> # Number of instructions simulated >>>>>> sim_ticks 213086000 >>>>>> # Number of ticks simulated >>>>>> system.l2.overall_accesses 4554 >>>>>> # number of overall (read+write) accesses >>>>>> system.l2.overall_hits 3871 >>>>>> # number of overall hits >>>>>> ..... >>>>>> >>>>>> ---------------------------------------------------------------------------- >>>>>> Even if Nops are counted as instructions, I don't see how that would >>>>>> make overall access/100,000 instructions vary by as much 200%. How does >>>>>> M5 >>>>>> count the number of instructions? >>>>>> Thanks, >>>>>> Steve >>>>>> >>>>>> _______________________________________________ >>>>>> m5-users mailing list >>>>>> m5-us...@m5sim.org >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-users mailing list >>>>> m5-us...@m5sim.org >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> m5-users mailing list >>>> m5-us...@m5sim.org >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>> >>> >>> >>> _______________________________________________ >>> m5-users mailing list >>> m5-us...@m5sim.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >> >> >> _______________________________________________ >> m5-users mailing list >> m5-us...@m5sim.org >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> > > > _______________________________________________ > m5-users mailing list > m5-us...@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > -- Gustavo Henrique Nihei LAPS - Laboratório de Automação do Projeto de Sistemas NIME - Núcleo Interdepartamental de Microeletrônica Universidade Federal de Santa Catarina Florianópolis - Santa Catarina - Brasil
_______________________________________________ gem5-users mailing list gem5-users@m5sim.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users