I'm having a little trouble following this thread, but if you have time, I'd suggest you run for a few short periods of time and then find out when the stats diverged. For example, run your application for 10, 50, and 100 insts (using --maxinsts parameter).
Those time granularities may be too short, but eventually you will find out where exactly the cycle counts are off and then you can better pinpoint what's happening. On Wed, Jan 25, 2012 at 2:47 AM, Madhavan manivannan <[email protected] > wrote: > Hi Nilay, > > Thanks! I still have a few more questions based on what you said. > > 1. I checked cpu/simple/timing.cc again and it seems like cycles are only > accounted for Events (during the period when the context is available). If > the need is to measure the time spent inside the parallel regions, which > figure be more appropriate (rubycycles or m5.numCycles) considering that > the difference is not negligible? I would assume ruby and m5 to also have > progressed the same number of ticks (irrespective of events in the cpu or > cache), which when converted to cycles should give the same number. There > still seems to be a diffference which I find hard to reason about. > > 2. The numInst stat variable in cpu/simple/base.hh seems to be incremented > using the countInst() function (assuming TimingSimpleCPU). Since this > function is called whenever an instruction completes execution and since it > does not work on the granularity of a microop, I am still doubtful about > the missing accesses. I however have a similar question like before; In > case I want to count the number of instructions executed (Ifetch accesses) > which stat would be more appropriate? > > 3. I have made an attempt to rephrase the last question. I hope it is more > understandle now. Assuming there is no need for ruby and m5 to have > progressed the same number of cycles, why is there a difference between the > sum of cumulative latencies (this figure is obtained by adding the > latencies of all cache access requests that reach a specific sequencer) and > the number of rubycycles progressed. Is the difference because rubycycles > include the CPU latencies in addition to latencies from cache access > requests? > > > Madhavan > > > Madhavan > > > On Tue, Jan 24, 2012 at 10:27 PM, Nilay Vaish <[email protected]> wrote: > >> On Mon, 23 Jan 2012, Madhavan manivannan wrote: >> >> Hi, >>> >>> I am simulating X86 TimingSimple CPU (16 cores) with Ruby >>> (MESI_CMP_Directory protocol) >>> memory model. The stats for m5 and ruby are reset at the beginning of the >>> parallel region >>> and dumped at the end of the parallel region. The following differences >>> are >>> however observed >>> between the stats generated by m5 and ruby. >>> >>> 1. The number of cycles (cpuxx.numcycles) reported in M5 stats file for >>> each cores is different. >>> However the number of cycles reported by ruby for each processor(cache) >>> is >>> the same. Why is >>> it different in M5 and not in ruby? Since they use the same event queue I >>> was expecting similar >>> values (number of cycles simulated) in ruby and m5 stats. The stats >>> however >>> show that ruby >>> cycles differ from m5 cycles (between 0 to -20%) for different apps. >>> Please >>> correct me if I have >>> totally missed something here. >>> >> >> It might be that the cpu cycles accounted for are the ones when a thread >> context was available for execution. >> >> >> >>> 2. I was expecting to see similar values for the total number of >>> instructions executed by >>> each core (M5 stats) and the total number of IFetch Events (Ruby Stats) >>> since Instruction >>> Fetch requests in TimingSimpleCPU uses the icacheport which inturn >>> directs >>> the request to >>> rubysequencer. However the number of IFetch events reported by Ruby is >>> around 30% lesser >>> than m5 stats. >>> >> >> This is possible since an instruction is broken down into microops and >> these microops are not fetched from the cache. It might be that the >> instruction count you refer to is the number of microops that were executed. >> >> >> >>> 3. Why is there a considerable difference between the cumulative sum of >>> miss latencies >>> measured at each sequencer and the total number of ruby cycles simulated. >>> Is it because >>> rubycycle includes CPU latencies in addition to cache latencies? >>> >> >> You need to rephrase the question. >> >> -- >> Nilay >> ______________________________**_________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users> >> > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > -- - Korey
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
