Re: [gem5-users] Differences in M5 and Ruby stats for the same application

Madhavan manivannan Thu, 26 Jan 2012 00:55:26 -0800

Oops! This should have been the first thing I should have tried. Thanks.


On Thu, Jan 26, 2012 at 5:10 AM, Korey Sewell <[email protected]> wrote:

> Can you not include the header file for the g_system_ptr?
>
> Find it with grep:
> grep "g_system_ptr;" src/*
>
> If you include the requisite header file, you should have access to the
> pointer...
>
>
> On Wed, Jan 25, 2012 at 11:45 AM, Madhavan manivannan <
> [email protected]> wrote:
>
>> Hi,
>>
>> The stats start to diverge after the simulator encounters m5_reset op and
>> this is because the global variable to signal a reset (in ruby) was doing
>> it only when the Sequencer encountered the next request (and not on the
>> same cycle).  This takes me to my next question; Is there a simple way to
>> also trigger the g_system_ptr->clearStats() when m5 encounters a reset op
>> (on the same cycle)?
>>
>> Regading the instruction cache access count I realized that I had
>> overlooked the interaction of microops with the instruction count function.
>> The comment made by Nilay certainly helped clarify the issue.
>>
>> Thanks again!
>>
>> Madhavan
>>
>> On Wed, Jan 25, 2012 at 2:14 PM, Madhavan manivannan <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> Thank for the tip. I will try what you suggested and see where things
>>> start to diverge.
>>>
>>> Madhavan
>>>
>>>
>>> On Wed, Jan 25, 2012 at 2:08 PM, Korey Sewell <[email protected]> wrote:
>>>
>>>> I'm having a little trouble following this thread, but if you have
>>>> time, I'd suggest you run for a few short periods of time and then find out
>>>> when the stats diverged. For example, run your application for 10, 50, and
>>>> 100 insts (using --maxinsts parameter).
>>>>
>>>> Those time granularities may be too short, but eventually you will find
>>>> out where exactly the cycle counts are off and then you can better pinpoint
>>>> what's happening.
>>>>
>>>>
>>>> On Wed, Jan 25, 2012 at 2:47 AM, Madhavan manivannan <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Nilay,
>>>>>
>>>>> Thanks! I still have a few more questions based on what you said.
>>>>>
>>>>> 1.  I checked cpu/simple/timing.cc again and it seems like cycles are
>>>>> only accounted for Events (during the period when the context is
>>>>> available). If the need is to measure the time spent inside the parallel
>>>>> regions, which figure be more appropriate (rubycycles or m5.numCycles)
>>>>> considering that the difference is not negligible? I would assume ruby and
>>>>> m5 to also have progressed the same number of ticks (irrespective of 
>>>>> events
>>>>> in the cpu or cache), which when converted to cycles should give the same
>>>>> number. There still seems to be a diffference which I find hard to reason
>>>>> about.
>>>>>
>>>>> 2. The numInst stat variable in cpu/simple/base.hh seems to be
>>>>> incremented using the countInst() function (assuming TimingSimpleCPU).
>>>>> Since this function is called whenever an instruction completes execution
>>>>> and since it does not work on the granularity of a microop, I am still
>>>>> doubtful about the missing accesses. I however have a similar question 
>>>>> like
>>>>> before; In case I want to count the number of instructions executed 
>>>>> (Ifetch
>>>>> accesses) which stat would be more appropriate?
>>>>>
>>>>> 3. I have made an attempt to rephrase the last question. I hope it is
>>>>> more understandle now. Assuming there is no need for ruby and m5 to have
>>>>> progressed the same number of cycles, why is there a difference between 
>>>>> the
>>>>> sum of cumulative latencies (this figure is obtained by adding the
>>>>> latencies of all cache access requests that reach a specific sequencer) 
>>>>> and
>>>>> the number of rubycycles progressed. Is the difference because rubycycles
>>>>> include the CPU latencies in addition to latencies from cache access
>>>>> requests?
>>>>>
>>>>>
>>>>> Madhavan
>>>>>
>>>>>
>>>>> Madhavan
>>>>>
>>>>>
>>>>> On Tue, Jan 24, 2012 at 10:27 PM, Nilay Vaish <[email protected]>wrote:
>>>>>
>>>>>> On Mon, 23 Jan 2012, Madhavan manivannan wrote:
>>>>>>
>>>>>>  Hi,
>>>>>>>
>>>>>>> I am simulating X86 TimingSimple CPU (16 cores) with Ruby
>>>>>>> (MESI_CMP_Directory protocol)
>>>>>>> memory model. The stats for m5 and ruby are reset at the beginning
>>>>>>> of the
>>>>>>> parallel region
>>>>>>> and dumped at the end of the parallel region. The following
>>>>>>> differences are
>>>>>>> however observed
>>>>>>> between the stats generated by m5 and ruby.
>>>>>>>
>>>>>>> 1. The number of cycles (cpuxx.numcycles) reported in M5 stats file
>>>>>>> for
>>>>>>> each cores is different.
>>>>>>> However the number of cycles reported by ruby for each
>>>>>>> processor(cache) is
>>>>>>> the same. Why is
>>>>>>> it different in M5 and not in ruby? Since they use the same event
>>>>>>> queue I
>>>>>>> was expecting similar
>>>>>>> values (number of cycles simulated) in ruby and m5 stats. The stats
>>>>>>> however
>>>>>>> show that ruby
>>>>>>> cycles differ from m5 cycles (between 0 to -20%) for different apps.
>>>>>>> Please
>>>>>>> correct me if I have
>>>>>>> totally missed something here.
>>>>>>>
>>>>>>
>>>>>> It might be that the cpu cycles accounted for are the ones when a
>>>>>> thread context was available for execution.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 2. I was expecting to see similar values for the total number of
>>>>>>> instructions executed by
>>>>>>> each core (M5 stats) and the total number of IFetch Events (Ruby
>>>>>>> Stats)
>>>>>>> since Instruction
>>>>>>> Fetch requests in TimingSimpleCPU uses the icacheport which inturn
>>>>>>> directs
>>>>>>> the request to
>>>>>>> rubysequencer. However the number of IFetch events reported by Ruby
>>>>>>> is
>>>>>>> around 30% lesser
>>>>>>> than m5 stats.
>>>>>>>
>>>>>>
>>>>>> This is possible since an instruction is broken down into microops
>>>>>> and these microops are not fetched from the cache. It might be that the
>>>>>> instruction count you refer to is the number of microops that were 
>>>>>> executed.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 3. Why is there a considerable difference between the cumulative sum
>>>>>>> of
>>>>>>> miss latencies
>>>>>>> measured at each sequencer and the total number of ruby cycles
>>>>>>> simulated.
>>>>>>> Is it because
>>>>>>> rubycycle includes CPU latencies in addition to cache latencies?
>>>>>>>
>>>>>>
>>>>>> You need to rephrase the question.
>>>>>>
>>>>>> --
>>>>>> Nilay
>>>>>> ______________________________**_________________
>>>>>> gem5-users mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gem5-users mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> - Korey
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>
>>>
>>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
>
> --
> - Korey
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Differences in M5 and Ruby stats for the same application

Reply via email to