Hi, Thank for the tip. I will try what you suggested and see where things start to diverge.
Madhavan On Wed, Jan 25, 2012 at 2:08 PM, Korey Sewell <[email protected]> wrote: > I'm having a little trouble following this thread, but if you have time, > I'd suggest you run for a few short periods of time and then find out when > the stats diverged. For example, run your application for 10, 50, and 100 > insts (using --maxinsts parameter). > > Those time granularities may be too short, but eventually you will find > out where exactly the cycle counts are off and then you can better pinpoint > what's happening. > > > On Wed, Jan 25, 2012 at 2:47 AM, Madhavan manivannan < > [email protected]> wrote: > >> Hi Nilay, >> >> Thanks! I still have a few more questions based on what you said. >> >> 1. I checked cpu/simple/timing.cc again and it seems like cycles are >> only accounted for Events (during the period when the context is >> available). If the need is to measure the time spent inside the parallel >> regions, which figure be more appropriate (rubycycles or m5.numCycles) >> considering that the difference is not negligible? I would assume ruby and >> m5 to also have progressed the same number of ticks (irrespective of events >> in the cpu or cache), which when converted to cycles should give the same >> number. There still seems to be a diffference which I find hard to reason >> about. >> >> 2. The numInst stat variable in cpu/simple/base.hh seems to be >> incremented using the countInst() function (assuming TimingSimpleCPU). >> Since this function is called whenever an instruction completes execution >> and since it does not work on the granularity of a microop, I am still >> doubtful about the missing accesses. I however have a similar question like >> before; In case I want to count the number of instructions executed (Ifetch >> accesses) which stat would be more appropriate? >> >> 3. I have made an attempt to rephrase the last question. I hope it is >> more understandle now. Assuming there is no need for ruby and m5 to have >> progressed the same number of cycles, why is there a difference between the >> sum of cumulative latencies (this figure is obtained by adding the >> latencies of all cache access requests that reach a specific sequencer) and >> the number of rubycycles progressed. Is the difference because rubycycles >> include the CPU latencies in addition to latencies from cache access >> requests? >> >> >> Madhavan >> >> >> Madhavan >> >> >> On Tue, Jan 24, 2012 at 10:27 PM, Nilay Vaish <[email protected]> wrote: >> >>> On Mon, 23 Jan 2012, Madhavan manivannan wrote: >>> >>> Hi, >>>> >>>> I am simulating X86 TimingSimple CPU (16 cores) with Ruby >>>> (MESI_CMP_Directory protocol) >>>> memory model. The stats for m5 and ruby are reset at the beginning of >>>> the >>>> parallel region >>>> and dumped at the end of the parallel region. The following differences >>>> are >>>> however observed >>>> between the stats generated by m5 and ruby. >>>> >>>> 1. The number of cycles (cpuxx.numcycles) reported in M5 stats file for >>>> each cores is different. >>>> However the number of cycles reported by ruby for each processor(cache) >>>> is >>>> the same. Why is >>>> it different in M5 and not in ruby? Since they use the same event queue >>>> I >>>> was expecting similar >>>> values (number of cycles simulated) in ruby and m5 stats. The stats >>>> however >>>> show that ruby >>>> cycles differ from m5 cycles (between 0 to -20%) for different apps. >>>> Please >>>> correct me if I have >>>> totally missed something here. >>>> >>> >>> It might be that the cpu cycles accounted for are the ones when a thread >>> context was available for execution. >>> >>> >>> >>>> 2. I was expecting to see similar values for the total number of >>>> instructions executed by >>>> each core (M5 stats) and the total number of IFetch Events (Ruby Stats) >>>> since Instruction >>>> Fetch requests in TimingSimpleCPU uses the icacheport which inturn >>>> directs >>>> the request to >>>> rubysequencer. However the number of IFetch events reported by Ruby is >>>> around 30% lesser >>>> than m5 stats. >>>> >>> >>> This is possible since an instruction is broken down into microops and >>> these microops are not fetched from the cache. It might be that the >>> instruction count you refer to is the number of microops that were executed. >>> >>> >>> >>>> 3. Why is there a considerable difference between the cumulative sum of >>>> miss latencies >>>> measured at each sequencer and the total number of ruby cycles >>>> simulated. >>>> Is it because >>>> rubycycle includes CPU latencies in addition to cache latencies? >>>> >>> >>> You need to rephrase the question. >>> >>> -- >>> Nilay >>> ______________________________**_________________ >>> gem5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users> >>> >> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > > -- > - Korey > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
