Hi,

Thank for the tip. I will try what you suggested and see where things start
to diverge.

Madhavan

On Wed, Jan 25, 2012 at 2:08 PM, Korey Sewell <[email protected]> wrote:

> I'm having a little trouble following this thread, but if you have time,
> I'd suggest you run for a few short periods of time and then find out when
> the stats diverged. For example, run your application for 10, 50, and 100
> insts (using --maxinsts parameter).
>
> Those time granularities may be too short, but eventually you will find
> out where exactly the cycle counts are off and then you can better pinpoint
> what's happening.
>
>
> On Wed, Jan 25, 2012 at 2:47 AM, Madhavan manivannan <
> [email protected]> wrote:
>
>> Hi Nilay,
>>
>> Thanks! I still have a few more questions based on what you said.
>>
>> 1.  I checked cpu/simple/timing.cc again and it seems like cycles are
>> only accounted for Events (during the period when the context is
>> available). If the need is to measure the time spent inside the parallel
>> regions, which figure be more appropriate (rubycycles or m5.numCycles)
>> considering that the difference is not negligible? I would assume ruby and
>> m5 to also have progressed the same number of ticks (irrespective of events
>> in the cpu or cache), which when converted to cycles should give the same
>> number. There still seems to be a diffference which I find hard to reason
>> about.
>>
>> 2. The numInst stat variable in cpu/simple/base.hh seems to be
>> incremented using the countInst() function (assuming TimingSimpleCPU).
>> Since this function is called whenever an instruction completes execution
>> and since it does not work on the granularity of a microop, I am still
>> doubtful about the missing accesses. I however have a similar question like
>> before; In case I want to count the number of instructions executed (Ifetch
>> accesses) which stat would be more appropriate?
>>
>> 3. I have made an attempt to rephrase the last question. I hope it is
>> more understandle now. Assuming there is no need for ruby and m5 to have
>> progressed the same number of cycles, why is there a difference between the
>> sum of cumulative latencies (this figure is obtained by adding the
>> latencies of all cache access requests that reach a specific sequencer) and
>> the number of rubycycles progressed. Is the difference because rubycycles
>> include the CPU latencies in addition to latencies from cache access
>> requests?
>>
>>
>> Madhavan
>>
>>
>> Madhavan
>>
>>
>> On Tue, Jan 24, 2012 at 10:27 PM, Nilay Vaish <[email protected]> wrote:
>>
>>> On Mon, 23 Jan 2012, Madhavan manivannan wrote:
>>>
>>>  Hi,
>>>>
>>>> I am simulating X86 TimingSimple CPU (16 cores) with Ruby
>>>> (MESI_CMP_Directory protocol)
>>>> memory model. The stats for m5 and ruby are reset at the beginning of
>>>> the
>>>> parallel region
>>>> and dumped at the end of the parallel region. The following differences
>>>> are
>>>> however observed
>>>> between the stats generated by m5 and ruby.
>>>>
>>>> 1. The number of cycles (cpuxx.numcycles) reported in M5 stats file for
>>>> each cores is different.
>>>> However the number of cycles reported by ruby for each processor(cache)
>>>> is
>>>> the same. Why is
>>>> it different in M5 and not in ruby? Since they use the same event queue
>>>> I
>>>> was expecting similar
>>>> values (number of cycles simulated) in ruby and m5 stats. The stats
>>>> however
>>>> show that ruby
>>>> cycles differ from m5 cycles (between 0 to -20%) for different apps.
>>>> Please
>>>> correct me if I have
>>>> totally missed something here.
>>>>
>>>
>>> It might be that the cpu cycles accounted for are the ones when a thread
>>> context was available for execution.
>>>
>>>
>>>
>>>> 2. I was expecting to see similar values for the total number of
>>>> instructions executed by
>>>> each core (M5 stats) and the total number of IFetch Events (Ruby Stats)
>>>> since Instruction
>>>> Fetch requests in TimingSimpleCPU uses the icacheport which inturn
>>>> directs
>>>> the request to
>>>> rubysequencer. However the number of IFetch events reported by Ruby is
>>>> around 30% lesser
>>>> than m5 stats.
>>>>
>>>
>>> This is possible since an instruction is broken down into microops and
>>> these microops are not fetched from the cache. It might be that the
>>> instruction count you refer to is the number of microops that were executed.
>>>
>>>
>>>
>>>> 3. Why is there a considerable difference between the cumulative sum of
>>>> miss latencies
>>>> measured at each sequencer and the total number of ruby cycles
>>>> simulated.
>>>> Is it because
>>>> rubycycle includes CPU latencies in addition to cache latencies?
>>>>
>>>
>>> You need to rephrase the question.
>>>
>>> --
>>> Nilay
>>> ______________________________**_________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/**mailman/listinfo/gem5-users<http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>
>>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
>
> --
> - Korey
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to