Oops! This should have been the first thing I should have tried. Thanks.
On Thu, Jan 26, 2012 at 5:10 AM, Korey Sewell <[email protected]> wrote: > Can you not include the header file for the g_system_ptr? > > Find it with grep: > grep "g_system_ptr;" src/* > > If you include the requisite header file, you should have access to the > pointer... > > > On Wed, Jan 25, 2012 at 11:45 AM, Madhavan manivannan < > [email protected]> wrote: > >> Hi, >> >> The stats start to diverge after the simulator encounters m5_reset op and >> this is because the global variable to signal a reset (in ruby) was doing >> it only when the Sequencer encountered the next request (and not on the >> same cycle). This takes me to my next question; Is there a simple way to >> also trigger the g_system_ptr->clearStats() when m5 encounters a reset op >> (on the same cycle)? >> >> Regading the instruction cache access count I realized that I had >> overlooked the interaction of microops with the instruction count function. >> The comment made by Nilay certainly helped clarify the issue. >> >> Thanks again! >> >> Madhavan >> >> On Wed, Jan 25, 2012 at 2:14 PM, Madhavan manivannan < >> [email protected]> wrote: >> >>> Hi, >>> >>> Thank for the tip. I will try what you suggested and see where things >>> start to diverge. >>> >>> Madhavan >>> >>> >>> On Wed, Jan 25, 2012 at 2:08 PM, Korey Sewell <[email protected]> wrote: >>> >>>> I'm having a little trouble following this thread, but if you have >>>> time, I'd suggest you run for a few short periods of time and then find out >>>> when the stats diverged. For example, run your application for 10, 50, and >>>> 100 insts (using --maxinsts parameter). >>>> >>>> Those time granularities may be too short, but eventually you will find >>>> out where exactly the cycle counts are off and then you can better pinpoint >>>> what's happening. >>>> >>>> >>>> On Wed, Jan 25, 2012 at 2:47 AM, Madhavan manivannan < >>>> [email protected]> wrote: >>>> >>>>> Hi Nilay, >>>>> >>>>> Thanks! I still have a few more questions based on what you said. >>>>> >>>>> 1. I checked cpu/simple/timing.cc again and it seems like cycles are >>>>> only accounted for Events (during the period when the context is >>>>> available). If the need is to measure the time spent inside the parallel >>>>> regions, which figure be more appropriate (rubycycles or m5.numCycles) >>>>> considering that the difference is not negligible? I would assume ruby and >>>>> m5 to also have progressed the same number of ticks (irrespective of >>>>> events >>>>> in the cpu or cache), which when converted to cycles should give the same >>>>> number. There still seems to be a diffference which I find hard to reason >>>>> about. >>>>> >>>>> 2. The numInst stat variable in cpu/simple/base.hh seems to be >>>>> incremented using the countInst() function (assuming TimingSimpleCPU). >>>>> Since this function is called whenever an instruction completes execution >>>>> and since it does not work on the granularity of a microop, I am still >>>>> doubtful about the missing accesses. I however have a similar question >>>>> like >>>>> before; In case I want to count the number of instructions executed >>>>> (Ifetch >>>>> accesses) which stat would be more appropriate? >>>>> >>>>> 3. I have made an attempt to rephrase the last question. I hope it is >>>>> more understandle now. Assuming there is no need for ruby and m5 to have >>>>> progressed the same number of cycles, why is there a difference between >>>>> the >>>>> sum of cumulative latencies (this figure is obtained by adding the >>>>> latencies of all cache access requests that reach a specific sequencer) >>>>> and >>>>> the number of rubycycles progressed. Is the difference because rubycycles >>>>> include the CPU latencies in addition to latencies from cache access >>>>> requests? >>>>> >>>>> >>>>> Madhavan >>>>> >>>>> >>>>> Madhavan >>>>> >>>>> >>>>> On Tue, Jan 24, 2012 at 10:27 PM, Nilay Vaish <[email protected]>wrote: >>>>> >>>>>> On Mon, 23 Jan 2012, Madhavan manivannan wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> I am simulating X86 TimingSimple CPU (16 cores) with Ruby >>>>>>> (MESI_CMP_Directory protocol) >>>>>>> memory model. The stats for m5 and ruby are reset at the beginning >>>>>>> of the >>>>>>> parallel region >>>>>>> and dumped at the end of the parallel region. The following >>>>>>> differences are >>>>>>> however observed >>>>>>> between the stats generated by m5 and ruby. >>>>>>> >>>>>>> 1. The number of cycles (cpuxx.numcycles) reported in M5 stats file >>>>>>> for >>>>>>> each cores is different. >>>>>>> However the number of cycles reported by ruby for each >>>>>>> processor(cache) is >>>>>>> the same. Why is >>>>>>> it different in M5 and not in ruby? Since they use the same event >>>>>>> queue I >>>>>>> was expecting similar >>>>>>> values (number of cycles simulated) in ruby and m5 stats. The stats >>>>>>> however >>>>>>> show that ruby >>>>>>> cycles differ from m5 cycles (between 0 to -20%) for different apps. >>>>>>> Please >>>>>>> correct me if I have >>>>>>> totally missed something here. >>>>>>> >>>>>> >>>>>> It might be that the cpu cycles accounted for are the ones when a >>>>>> thread context was available for execution. >>>>>> >>>>>> >>>>>> >>>>>>> 2. I was expecting to see similar values for the total number of >>>>>>> instructions executed by >>>>>>> each core (M5 stats) and the total number of IFetch Events (Ruby >>>>>>> Stats) >>>>>>> since Instruction >>>>>>> Fetch requests in TimingSimpleCPU uses the icacheport which inturn >>>>>>> directs >>>>>>> the request to >>>>>>> rubysequencer. However the number of IFetch events reported by Ruby >>>>>>> is >>>>>>> around 30% lesser >>>>>>> than m5 stats. >>>>>>> >>>>>> >>>>>> This is possible since an instruction is broken down into microops >>>>>> and these microops are not fetched from the cache. It might be that the >>>>>> instruction count you refer to is the number of microops that were >>>>>> executed. >>>>>> >>>>>> >>>>>> >>>>>>> 3. Why is there a considerable difference between the cumulative sum >>>>>>> of >>>>>>> miss latencies >>>>>>> measured at each sequencer and the total number of ruby cycles >>>>>>> simulated. >>>>>>> Is it because >>>>>>> rubycycle includes CPU latencies in addition to cache latencies? >>>>>>> >>>>>> >>>>>> You need to rephrase the question. >>>>>> >>>>>> -- >>>>>> Nilay >>>>>> ______________________________**_________________ >>>>>> gem5-users mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing list >>>>> [email protected] >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>> >>>> >>>> >>>> >>>> -- >>>> - Korey >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>> >>> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > > -- > - Korey > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
