Re: [gem5-users] Switching to Arm Detailed CPU after Checkpoint Restore - Committed Instruction Count

Andrew Cebulski Sat, 03 Mar 2012 13:10:45 -0800

It looks like the TLB trace flag prints out the asid (in tlb.cc and
table_walker.cc)...are there others I should use instead or in addition?


Thanks,
Andrew

On Sat, Mar 3, 2012 at 3:12 PM, Ali Saidi <[email protected]> wrote:

> Hi Andrew,
>
> You could get a trace using debug flag Exec and seeing where the extra
> instructions are coming from. You might want to sleep for 10 or 15 seconds
> before running your benchmark and see what happens. Since the solution
> validates my guess is that the Linux scheduler isn't cooperating with you
> but understanding where all these instructions are coming from is the only
> way to know for certain. You probably also want to use the trace flags that
> print out the asid so you can identify one app/pid from another.
>
> Ali
>
> Sent from my ARM powered mobile device
>
> On Mar 3, 2012, at 1:38 PM, Andrew Cebulski <[email protected]> wrote:
>
> Hi Ali,
>
>    The benchmark is libquantum from SPEC CPU2006.  The results are printed
> out to the system.terminal, so I am able to verify.  In all cases it passes
> with the exact same output.  Note that when I don't restore from a
> checkpoint...the committed instructions for the O3 CPU are roughly the same
> as atomic (within 100,000 instructions).
>
>    Yes, I did actually run atomic CPU with a checkpoint restore.  It
> resulted in 476085242 committed instructions...the exact same as without
> launching from a checkpoint.
>
>    I'll work on getting you results from another benchmark.  In the
> meantime, let me know if you have any other ideas.
>
> Thanks,
> Andrew
>
> On Sat, Mar 3, 2012 at 2:14 PM, Ali Saidi <[email protected]> wrote:
>
>> Hi Andrew,
>>
>> Are you sure the benchmark isn't timing dependent. Does the benchmark do
>> any kind of self-checking (E.g. the benchmark completes,but does it come to
>> the right answer)?
>>
>> Did you ever run the atomic cpu with a checkpoint restore? What is the
>> instruction count in this case?
>>
>> Thanks,
>> Ali
>>
>> On Mar 2, 2012, at 10:08 PM, Andrew Cebulski wrote:
>>
>> Okay, checker built and ran perfectly as far as I can tell.  Thanks!
>>
>> Here are the errors reported by the checker:
>>
>> warn: 3009947097500: Instruction results do not match! (Values may not
>> actually be integers) Inst: 0x2281c, checker: 0x281c
>> warn: 3015415839500: Instruction results do not match! (Values may not
>> actually be integers) Inst: 0x2281c, checker: 0x281c
>> warn: 3077134098000: Instruction results do not match! (Values may not
>> actually be integers) Inst: 0x2, checker: 0
>>
>> A grep shows this coming from src/cpu/checker/cpu_impl.hh
>>
>> My benchmark ran to completion with the following results:
>>
>> Detailed CPU (checkpoint restore) :   system.switch_cpus_1.committedInsts
>> = 610834324
>>
>> system.switch_cpus_1.committedOps (new stat) = 646803879  (this is close to
>> what the committed instructions were before...)
>>
>> system.switch_cpus_1.fetch.Insts = 632688924
>>
>> What's the next step finding the source of this error?
>>
>> Thanks,
>> Andrew
>>
>> On Fri, Mar 2, 2012 at 5:04 PM, Andrew Cebulski <[email protected]> wrote:
>>
>>> This probably happened because I merged into rev 8877 instead of rev
>>> 8861.  The patch merged find with rev 8861, so none of my local changes
>>> conflicted.  I'm building now.  I'll send an update later when I'm blocked
>>> again.
>>>
>>> I actually just tried gcc 4.6.2 recently, so I experienced that swig
>>> error with ptrdiff_t.  Glad to see that was fixed in rev 8861.
>>>
>>> -Andrew
>>>
>>>
>>> On Fri, Mar 2, 2012 at 3:36 PM, Andrew Cebulski <[email protected]>wrote:
>>>
>>>> Okay, so I'm trying to build after patching this from the review
>>>> board:  http://reviews.m5sim.org/r/1031/
>>>>
>>>> There were a few minor merge issues with the patch, but they all seemed
>>>> easily resolved.  I'm merging this into gem5 revision 8884 (today).
>>>> Unfortunately, I'm getting this error:
>>>>
>>>>  [     CXX] ARM/cpu/checker/cpu.cc -> .fo
>>>> build/ARM/cpu/checker/cpu.cc: In member function 'void
>>>> CheckerCPU::setSystem(System*)':
>>>> build/ARM/cpu/checker/cpu.cc:106:43: error: no matching function for
>>>> call to 'SimpleThread::SimpleThread(CheckerCPU* const, int, System*&,
>>>> Process*, ArmISA::TLB*&, ArmISA::TLB*&)'
>>>> build/ARM/cpu/simple_thread.hh:142:5: note: candidates are:
>>>> SimpleThread::SimpleThread()
>>>> build/ARM/cpu/simple_thread.hh:139:5: note:
>>>> SimpleThread::SimpleThread(BaseCPU*, int, Process*, ArmISA::TLB*,
>>>> ArmISA::TLB*)
>>>> build/ARM/cpu/simple_thread.hh:135:5: note:
>>>> SimpleThread::SimpleThread(BaseCPU*, int, System*, ArmISA::TLB*,
>>>> ArmISA::TLB*, bool)
>>>> build/ARM/cpu/simple_thread.hh:96:1: note:
>>>> SimpleThread::SimpleThread(const SimpleThread&)
>>>> build/ARM/cpu/checker/cpu.cc: In member function 'Fault
>>>> CheckerCPU::readMem(Addr, uint8_t*, unsigned int, unsigned int)':
>>>> build/ARM/cpu/checker/cpu.cc:156:47: error: 'masterId' was not declared
>>>> in this scope
>>>> scons: *** [build/ARM/cpu/checker/cpu.fo] Error 1
>>>>
>>>> I tried patching to a repo I have with revision 8813 and received the
>>>> same error.  Are there some other patches from the reviewboard that I
>>>> should be including?
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>
>>>> On Fri, Mar 2, 2012 at 12:37 PM, Andrew Cebulski <[email protected]>wrote:
>>>>
>>>>> Geoff,
>>>>>
>>>>>    Okay, but it looks to me like that error is correctable.  I think
>>>>> that the m5.instantiate(checkpoint_dir) should only happen within the 'if
>>>>> options.checkpoint_restore != None:' statement (so it needs an extra tab).
>>>>>  As it is in the repository, it happens regardless of whether or not you
>>>>> are restoring from a checkpoint.  So you're essentially doing
>>>>> m5.instantiate(None).
>>>>>
>>>>> -Andrew
>>>>>
>>>>>
>>>>> On Fri, Mar 2, 2012 at 12:23 PM, Geoffrey Blake <[email protected]>wrote:
>>>>>
>>>>>> Andrew,
>>>>>>
>>>>>> You may want to wait until the most recent patches for the checker are
>>>>>> pushed that will allow you to just specify --checker on the command
>>>>>> line.  I forgot the checker as it is now in the tree had broken during
>>>>>> a recent merge with other changes.  Or, if you go to M5's reviewboard
>>>>>> you can grab the patches for the checker and apply them.
>>>>>>
>>>>>> Geoff
>>>>>>
>>>>>> On Fri, Mar 2, 2012 at 11:17 AM, Andrew Cebulski <[email protected]>
>>>>>> wrote:
>>>>>> > I'm getting the following error when running this basic command
>>>>>> with the CPU
>>>>>> > Checker enabled:
>>>>>> >
>>>>>> > build/ARM/gem5.fast configs/example/fs.py -b ArmUbuntu
>>>>>> --cpu-type=detailed
>>>>>> > --caches
>>>>>> >
>>>>>> > Error in unproxying param 'workload' of system.cpu.checker
>>>>>> > Traceback (most recent call last):
>>>>>> >   File "<string>", line 1, in ?
>>>>>> >   File "/gem5/src/python/m5/main.py", line 361, in main
>>>>>> >     exec filecode in scope
>>>>>> >   File "configs/example/fs.py", line 215, in ?
>>>>>> >     Simulation.run(options, root, test_sys, FutureClass)
>>>>>> >   File "/gem5/configs/common/Simulation.py", line 246, in run
>>>>>> >     m5.instantiate(checkpoint_dir)
>>>>>> >   File "/gem5/src/python/m5/simulate.py", line 66, in instantiate
>>>>>> >     for obj in root.descendants(): obj.unproxyParams()
>>>>>> >   File "/gem5/src/python/m5/SimObject.py", line 851, in
>>>>>> unproxyParams
>>>>>> >     value = value.unproxy(self)
>>>>>> >   File "/gem5/src/python/m5/params.py", line 196, in unproxy
>>>>>> >     return [v.unproxy(base) for v in self]
>>>>>> >   File "/gem5/src/python/m5/proxy.py", line 89, in unproxy
>>>>>> >     result, done = self.find(obj)
>>>>>> >   File "/gem5/src/python/m5/proxy.py", line 162, in find
>>>>>> >     val = val[m]
>>>>>> > IndexError: list index out of range
>>>>>> >
>>>>>> > Any idea why this is happening?  I'm not even attempting to launch
>>>>>> from a
>>>>>> > checkpoint here (though this exact error does occur when attempting
>>>>>> > restoring from checkpoint now).  Some notes on my environment...
>>>>>> I'm
>>>>>> > running Python 2.4.3, SWIG 1.3.40 and GCC 4.5.3.
>>>>>> >
>>>>>> > Note that when I run atomic/timing CPUs, I get a segmentation
>>>>>> fault.  I'm
>>>>>> > assuming this is because they don't have checker's setup in the
>>>>>> code.  Let
>>>>>> > me know if otherwise.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Andrew
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Mar 1, 2012 at 5:00 PM, Ali Saidi <[email protected]> wrote:
>>>>>> >>
>>>>>> >> Hi Andrew,
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> You should be able to re-compile gem5 with USE_CHECKER=1 on the
>>>>>> command
>>>>>> >> line and it will include the checker and run it when you restore
>>>>>> to the o3
>>>>>> >> cpu.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >>
>>>>>> >> Ali
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On 01.03.2012 14:02, Andrew Cebulski wrote:
>>>>>> >>
>>>>>> >> Hi Ali,
>>>>>> >>
>>>>>> >>     Okay, thanks, I'll try out the checker cpu.  Is this the best
>>>>>> resource
>>>>>> >> available on how to use the Checker CPU?  --
>>>>>> http://gem5.org/Checker
>>>>>> >>     Also, my run restoring the O3 CPU from my checkpoint has the
>>>>>> same
>>>>>> >> result:
>>>>>> >>     Detailed CPU (checkpoint restore) :
>>>>>>   system.cpu.committedInsts =
>>>>>> >> 646985567
>>>>>> >>
>>>>>> >>   system.cpu.fetch.Insts        = 648951747
>>>>>> >> Thanks,
>>>>>> >> Andrew
>>>>>> >>
>>>>>> >> On Thu, Mar 1, 2012 at 2:40 PM, Ali Saidi <[email protected]> wrote:
>>>>>> >>>
>>>>>> >>> Hi Andrew,
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> The first guess is that possibly the cpu results in a different
>>>>>> code path
>>>>>> >>> or different scheduler decisions which lengthen execution. Another
>>>>>> >>> possibility is that the O3 cpu as configured by the arm-detailed
>>>>>> >>> configuration has some issue. While this is possible it's not
>>>>>> incredibly
>>>>>> >>> likely. You could try to restore from the checkpoint and run with
>>>>>> the
>>>>>> >>> checker cpu. This creates a little atomic like cpu that sits next
>>>>>> to the o3
>>>>>> >>> core and verifies it's execution which might tell you if there is
>>>>>> a bug in
>>>>>> >>> the o3 model.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>>
>>>>>> >>> Ali
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On 01.03.2012 13:04, Andrew Cebulskiwrote:
>>>>>> >>>
>>>>>> >>> Hi,
>>>>>> >>>     I'm experiencing some problems that I currently am
>>>>>> attributing to
>>>>>> >>> restoring from a checkpoint, then switching to an arm_detailed CPU
>>>>>> >>> (O3_ARM_v7a_3).  I first noticed the problem due to my committed
>>>>>> instruction
>>>>>> >>> counts not lining up correctly between different CPUs for a
>>>>>> benchmark I'm
>>>>>> >>> running (by roughly 170M instructions).  The stats below are
>>>>>> reset right
>>>>>> >>> before running the benchmark, then dumped afterwards:
>>>>>> >>>     Atomic CPU (no checkpoint restore):  system.cpu.numInsts =
>>>>>> 476085242
>>>>>> >>>     Detailed CPU (no checkpoint restore):
>>>>>>  system.cpu.committedInsts =
>>>>>> >>> 476128320
>>>>>> >>>
>>>>>> >>>  system.cpu.fetch.Insts        = 478463491
>>>>>> >>>     Arm_detailed CPU (checkpoint restore):
>>>>>> >>>  system.switch_cpus_1.committedInsts = 646468886
>>>>>> >>>
>>>>>> >>> system.switch_cpus_1.fetch.Insts        = 660969371
>>>>>> >>>     Arm_detailed CPU (no checkpoint restore):
>>>>>>  system.cpu.committedInsts
>>>>>> >>> = 476107801
>>>>>> >>>
>>>>>> >>> system.cpu.fetch.Insts        = 491814681
>>>>>> >>>     I included both the committed and fetched instructions, to
>>>>>> see if the
>>>>>> >>> problem is with fetchs getting counted as committed even if they
>>>>>> are not
>>>>>> >>> (i.e. insts not getting squashed).  It does not seem like that is
>>>>>> the case
>>>>>> >>> from the stats above...as the arm_detailed run without a
>>>>>> checkpoint has
>>>>>> >>> roughly the same difference between fetched/committed
>>>>>> instructions.  I
>>>>>> >>> noticed that the switch arm_detailed cpu when restoring from a
>>>>>> checkpoint
>>>>>> >>> lacks both a icache and dcache as children, but I read in a
>>>>>> previous post
>>>>>> >>> that they are connected to fetch/iew respectively, so this is
>>>>>> probably not
>>>>>> >>> the issue.  I assume it's just not shown explicitly in the
>>>>>> config.ini
>>>>>> >>> file...
>>>>>> >>>     I'm running a test right now to see if switching to a regular
>>>>>> >>> DerivO3CPU has the same issue.  Regardless of its results, does
>>>>>> anyone have
>>>>>> >>> any idea why I'm seeing roughly 170M more committed instructions
>>>>>> in the
>>>>>> >>> arm_detailed CPU run when I restore from a checkpoint?  I've
>>>>>> attached my
>>>>>> >>> config file from the arm_detailed with checkpoint run for
>>>>>> reference.
>>>>>> >>>     Here's the run command for when I use a checkpoint:
>>>>>> >>>     build/ARM/gem5.fast -d [dir] configs/example/fs.py -b
>>>>>> [benchmark] -r
>>>>>> >>> 1 --checkpoint-dir=[chkpt-dir] --caches -s
>>>>>> >>>     Lastly, I'm running off of revision 8813 from 2/3/12.  Let me
>>>>>> know if
>>>>>> >>> you need anymore info (i.e. stats).
>>>>>> >>> Thanks,
>>>>>> >>> Andrew
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> _______________________________________________
>>>>>> >>> gem5-users mailing list
>>>>>> >>> [email protected]
>>>>>> >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > gem5-users mailing list
>>>>>> > [email protected]
>>>>>> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>> _______________________________________________
>>>>>> gem5-users mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Switching to Arm Detailed CPU after Checkpoint Restore - Committed Instruction Count

Reply via email to