Hi Ali,

   The benchmark is libquantum from SPEC CPU2006.  The results are printed
out to the system.terminal, so I am able to verify.  In all cases it passes
with the exact same output.  Note that when I don't restore from a
checkpoint...the committed instructions for the O3 CPU are roughly the same
as atomic (within 100,000 instructions).

   Yes, I did actually run atomic CPU with a checkpoint restore.  It
resulted in 476085242 committed instructions...the exact same as without
launching from a checkpoint.

   I'll work on getting you results from another benchmark.  In the
meantime, let me know if you have any other ideas.

Thanks,
Andrew

On Sat, Mar 3, 2012 at 2:14 PM, Ali Saidi <[email protected]> wrote:

> Hi Andrew,
>
> Are you sure the benchmark isn't timing dependent. Does the benchmark do
> any kind of self-checking (E.g. the benchmark completes,but does it come to
> the right answer)?
>
> Did you ever run the atomic cpu with a checkpoint restore? What is the
> instruction count in this case?
>
> Thanks,
> Ali
>
> On Mar 2, 2012, at 10:08 PM, Andrew Cebulski wrote:
>
> Okay, checker built and ran perfectly as far as I can tell.  Thanks!
>
> Here are the errors reported by the checker:
>
> warn: 3009947097500: Instruction results do not match! (Values may not
> actually be integers) Inst: 0x2281c, checker: 0x281c
> warn: 3015415839500: Instruction results do not match! (Values may not
> actually be integers) Inst: 0x2281c, checker: 0x281c
> warn: 3077134098000: Instruction results do not match! (Values may not
> actually be integers) Inst: 0x2, checker: 0
>
> A grep shows this coming from src/cpu/checker/cpu_impl.hh
>
> My benchmark ran to completion with the following results:
>
> Detailed CPU (checkpoint restore) :   system.switch_cpus_1.committedInsts
> = 610834324
>
> system.switch_cpus_1.committedOps (new stat) = 646803879  (this is close to
> what the committed instructions were before...)
>
> system.switch_cpus_1.fetch.Insts = 632688924
>
> What's the next step finding the source of this error?
>
> Thanks,
> Andrew
>
> On Fri, Mar 2, 2012 at 5:04 PM, Andrew Cebulski <[email protected]> wrote:
>
>> This probably happened because I merged into rev 8877 instead of rev
>> 8861.  The patch merged find with rev 8861, so none of my local changes
>> conflicted.  I'm building now.  I'll send an update later when I'm blocked
>> again.
>>
>> I actually just tried gcc 4.6.2 recently, so I experienced that swig
>> error with ptrdiff_t.  Glad to see that was fixed in rev 8861.
>>
>> -Andrew
>>
>>
>> On Fri, Mar 2, 2012 at 3:36 PM, Andrew Cebulski <[email protected]> wrote:
>>
>>> Okay, so I'm trying to build after patching this from the review board:
>>> http://reviews.m5sim.org/r/1031/
>>>
>>> There were a few minor merge issues with the patch, but they all seemed
>>> easily resolved.  I'm merging this into gem5 revision 8884 (today).
>>> Unfortunately, I'm getting this error:
>>>
>>>  [     CXX] ARM/cpu/checker/cpu.cc -> .fo
>>> build/ARM/cpu/checker/cpu.cc: In member function 'void
>>> CheckerCPU::setSystem(System*)':
>>> build/ARM/cpu/checker/cpu.cc:106:43: error: no matching function for
>>> call to 'SimpleThread::SimpleThread(CheckerCPU* const, int, System*&,
>>> Process*, ArmISA::TLB*&, ArmISA::TLB*&)'
>>> build/ARM/cpu/simple_thread.hh:142:5: note: candidates are:
>>> SimpleThread::SimpleThread()
>>> build/ARM/cpu/simple_thread.hh:139:5: note:
>>> SimpleThread::SimpleThread(BaseCPU*, int, Process*, ArmISA::TLB*,
>>> ArmISA::TLB*)
>>> build/ARM/cpu/simple_thread.hh:135:5: note:
>>> SimpleThread::SimpleThread(BaseCPU*, int, System*, ArmISA::TLB*,
>>> ArmISA::TLB*, bool)
>>> build/ARM/cpu/simple_thread.hh:96:1: note:
>>> SimpleThread::SimpleThread(const SimpleThread&)
>>> build/ARM/cpu/checker/cpu.cc: In member function 'Fault
>>> CheckerCPU::readMem(Addr, uint8_t*, unsigned int, unsigned int)':
>>> build/ARM/cpu/checker/cpu.cc:156:47: error: 'masterId' was not declared
>>> in this scope
>>> scons: *** [build/ARM/cpu/checker/cpu.fo] Error 1
>>>
>>> I tried patching to a repo I have with revision 8813 and received the
>>> same error.  Are there some other patches from the reviewboard that I
>>> should be including?
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>> On Fri, Mar 2, 2012 at 12:37 PM, Andrew Cebulski <[email protected]>wrote:
>>>
>>>> Geoff,
>>>>
>>>>    Okay, but it looks to me like that error is correctable.  I think
>>>> that the m5.instantiate(checkpoint_dir) should only happen within the 'if
>>>> options.checkpoint_restore != None:' statement (so it needs an extra tab).
>>>>  As it is in the repository, it happens regardless of whether or not you
>>>> are restoring from a checkpoint.  So you're essentially doing
>>>> m5.instantiate(None).
>>>>
>>>> -Andrew
>>>>
>>>>
>>>> On Fri, Mar 2, 2012 at 12:23 PM, Geoffrey Blake <[email protected]>wrote:
>>>>
>>>>> Andrew,
>>>>>
>>>>> You may want to wait until the most recent patches for the checker are
>>>>> pushed that will allow you to just specify --checker on the command
>>>>> line.  I forgot the checker as it is now in the tree had broken during
>>>>> a recent merge with other changes.  Or, if you go to M5's reviewboard
>>>>> you can grab the patches for the checker and apply them.
>>>>>
>>>>> Geoff
>>>>>
>>>>> On Fri, Mar 2, 2012 at 11:17 AM, Andrew Cebulski <[email protected]>
>>>>> wrote:
>>>>> > I'm getting the following error when running this basic command with
>>>>> the CPU
>>>>> > Checker enabled:
>>>>> >
>>>>> > build/ARM/gem5.fast configs/example/fs.py -b ArmUbuntu
>>>>> --cpu-type=detailed
>>>>> > --caches
>>>>> >
>>>>> > Error in unproxying param 'workload' of system.cpu.checker
>>>>> > Traceback (most recent call last):
>>>>> >   File "<string>", line 1, in ?
>>>>> >   File "/gem5/src/python/m5/main.py", line 361, in main
>>>>> >     exec filecode in scope
>>>>> >   File "configs/example/fs.py", line 215, in ?
>>>>> >     Simulation.run(options, root, test_sys, FutureClass)
>>>>> >   File "/gem5/configs/common/Simulation.py", line 246, in run
>>>>> >     m5.instantiate(checkpoint_dir)
>>>>> >   File "/gem5/src/python/m5/simulate.py", line 66, in instantiate
>>>>> >     for obj in root.descendants(): obj.unproxyParams()
>>>>> >   File "/gem5/src/python/m5/SimObject.py", line 851, in unproxyParams
>>>>> >     value = value.unproxy(self)
>>>>> >   File "/gem5/src/python/m5/params.py", line 196, in unproxy
>>>>> >     return [v.unproxy(base) for v in self]
>>>>> >   File "/gem5/src/python/m5/proxy.py", line 89, in unproxy
>>>>> >     result, done = self.find(obj)
>>>>> >   File "/gem5/src/python/m5/proxy.py", line 162, in find
>>>>> >     val = val[m]
>>>>> > IndexError: list index out of range
>>>>> >
>>>>> > Any idea why this is happening?  I'm not even attempting to launch
>>>>> from a
>>>>> > checkpoint here (though this exact error does occur when attempting
>>>>> > restoring from checkpoint now).  Some notes on my environment...  I'm
>>>>> > running Python 2.4.3, SWIG 1.3.40 and GCC 4.5.3.
>>>>> >
>>>>> > Note that when I run atomic/timing CPUs, I get a segmentation
>>>>> fault.  I'm
>>>>> > assuming this is because they don't have checker's setup in the
>>>>> code.  Let
>>>>> > me know if otherwise.
>>>>> >
>>>>> > Thanks,
>>>>> > Andrew
>>>>> >
>>>>> >
>>>>> > On Thu, Mar 1, 2012 at 5:00 PM, Ali Saidi <[email protected]> wrote:
>>>>> >>
>>>>> >> Hi Andrew,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> You should be able to re-compile gem5 with USE_CHECKER=1 on the
>>>>> command
>>>>> >> line and it will include the checker and run it when you restore to
>>>>> the o3
>>>>> >> cpu.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Thanks,
>>>>> >>
>>>>> >> Ali
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On 01.03.2012 14:02, Andrew Cebulski wrote:
>>>>> >>
>>>>> >> Hi Ali,
>>>>> >>
>>>>> >>     Okay, thanks, I'll try out the checker cpu.  Is this the best
>>>>> resource
>>>>> >> available on how to use the Checker CPU?  --
>>>>> http://gem5.org/Checker
>>>>> >>     Also, my run restoring the O3 CPU from my checkpoint has the
>>>>> same
>>>>> >> result:
>>>>> >>     Detailed CPU (checkpoint restore) :   system.cpu.committedInsts
>>>>> =
>>>>> >> 646985567
>>>>> >>
>>>>> >>   system.cpu.fetch.Insts        = 648951747
>>>>> >> Thanks,
>>>>> >> Andrew
>>>>> >>
>>>>> >> On Thu, Mar 1, 2012 at 2:40 PM, Ali Saidi <[email protected]> wrote:
>>>>> >>>
>>>>> >>> Hi Andrew,
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> The first guess is that possibly the cpu results in a different
>>>>> code path
>>>>> >>> or different scheduler decisions which lengthen execution. Another
>>>>> >>> possibility is that the O3 cpu as configured by the arm-detailed
>>>>> >>> configuration has some issue. While this is possible it's not
>>>>> incredibly
>>>>> >>> likely. You could try to restore from the checkpoint and run with
>>>>> the
>>>>> >>> checker cpu. This creates a little atomic like cpu that sits next
>>>>> to the o3
>>>>> >>> core and verifies it's execution which might tell you if there is
>>>>> a bug in
>>>>> >>> the o3 model.
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>>
>>>>> >>> Ali
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On 01.03.2012 13:04, Andrew Cebulskiwrote:
>>>>> >>>
>>>>> >>> Hi,
>>>>> >>>     I'm experiencing some problems that I currently am attributing
>>>>> to
>>>>> >>> restoring from a checkpoint, then switching to an arm_detailed CPU
>>>>> >>> (O3_ARM_v7a_3).  I first noticed the problem due to my committed
>>>>> instruction
>>>>> >>> counts not lining up correctly between different CPUs for a
>>>>> benchmark I'm
>>>>> >>> running (by roughly 170M instructions).  The stats below are reset
>>>>> right
>>>>> >>> before running the benchmark, then dumped afterwards:
>>>>> >>>     Atomic CPU (no checkpoint restore):  system.cpu.numInsts =
>>>>> 476085242
>>>>> >>>     Detailed CPU (no checkpoint restore):
>>>>>  system.cpu.committedInsts =
>>>>> >>> 476128320
>>>>> >>>
>>>>> >>>  system.cpu.fetch.Insts        = 478463491
>>>>> >>>     Arm_detailed CPU (checkpoint restore):
>>>>> >>>  system.switch_cpus_1.committedInsts = 646468886
>>>>> >>>
>>>>> >>> system.switch_cpus_1.fetch.Insts        = 660969371
>>>>> >>>     Arm_detailed CPU (no checkpoint restore):
>>>>>  system.cpu.committedInsts
>>>>> >>> = 476107801
>>>>> >>>
>>>>> >>> system.cpu.fetch.Insts        = 491814681
>>>>> >>>     I included both the committed and fetched instructions, to see
>>>>> if the
>>>>> >>> problem is with fetchs getting counted as committed even if they
>>>>> are not
>>>>> >>> (i.e. insts not getting squashed).  It does not seem like that is
>>>>> the case
>>>>> >>> from the stats above...as the arm_detailed run without a
>>>>> checkpoint has
>>>>> >>> roughly the same difference between fetched/committed
>>>>> instructions.  I
>>>>> >>> noticed that the switch arm_detailed cpu when restoring from a
>>>>> checkpoint
>>>>> >>> lacks both a icache and dcache as children, but I read in a
>>>>> previous post
>>>>> >>> that they are connected to fetch/iew respectively, so this is
>>>>> probably not
>>>>> >>> the issue.  I assume it's just not shown explicitly in the
>>>>> config.ini
>>>>> >>> file...
>>>>> >>>     I'm running a test right now to see if switching to a regular
>>>>> >>> DerivO3CPU has the same issue.  Regardless of its results, does
>>>>> anyone have
>>>>> >>> any idea why I'm seeing roughly 170M more committed instructions
>>>>> in the
>>>>> >>> arm_detailed CPU run when I restore from a checkpoint?  I've
>>>>> attached my
>>>>> >>> config file from the arm_detailed with checkpoint run for
>>>>> reference.
>>>>> >>>     Here's the run command for when I use a checkpoint:
>>>>> >>>     build/ARM/gem5.fast -d [dir] configs/example/fs.py -b
>>>>> [benchmark] -r
>>>>> >>> 1 --checkpoint-dir=[chkpt-dir] --caches -s
>>>>> >>>     Lastly, I'm running off of revision 8813 from 2/3/12.  Let me
>>>>> know if
>>>>> >>> you need anymore info (i.e. stats).
>>>>> >>> Thanks,
>>>>> >>> Andrew
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> gem5-users mailing list
>>>>> >>> [email protected]
>>>>> >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > gem5-users mailing list
>>>>> > [email protected]
>>>>> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>> _______________________________________________
>>>>> gem5-users mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>
>>>>
>>>>
>>>
>>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to