Re: [m5-dev] Implementing checkpointing for inorder

nathan binkert Sun, 04 Jul 2010 00:29:36 -0700

Why are you trying to checkpoint the InOrderCPU?  Wouldn't it be
better to implement the switchover from SimpleCPU to InOrder?  You
can't checkpoint caches right now, so it doesn't seem worthwhile to
checkpoint inorder.


  Nate

On Sat, Jul 3, 2010 at 1:56 PM, soumyaroop roy <s...@cse.usf.edu> wrote:
> Hello there:
>
> I am revisiting an earlier suspended effort to implement checkpointing
> for the inorder cpu and I am currently debugging a problem (for the
> case of a uniprocessor and no multithreading). Let me describe the
> problem here.
>
> I am using the hello world program. I am taking a checkpoint at
> instruction 100 (by specifying --take-checkpoint=100 --at-instruction)
> and then restoring from there and running another 100 instructions. I
> generated a trace of ONLY the retired instructions from a separate run
> of the inorder cpu that retires 200 instructions and compared that
> trace with the traces generated by the checkpointing and checkpoint
> restoration steps. I see that there is a bug in the simulation of the
> 76th instruction after restoration of the program (a load instruction
> loads a 1 instead of a 0) that causes the problem.
>
> Now, this is my understanding of how a checkpoint is taken. Please
> correct me if I am wrong. I noted that when checkpointing is specified
> with these options: "--take-checkpoint=N --at-instruction", the
> max_insts_any_thread for the cpu is set to N which sets up a
> termination event in the committed instructions queue,
> comInstEventQueue (lets consider a uniprocessor and no
> multithreading). After each instruction is retired the events from
> this queue are serviced. So, when N instructions have been committed,
> the drain() routine is called. The simulation is exited subsequently.
> Then the writing of the checkpoint is directed by the python script,
> Simulation.py. The serialize() routine should be called before the
> simulation is exited, right? Also, the total number of retired
> instructions can be more than N eventually, right?
>
> Here is another observation which is a bit confusing to me. I traced
> the routines that are called during O3's checkpointing and the
> resume() routine is called when the checkpoint is taken (after drain()
> and serialize() routines). Why is this happening? Shouldn't resume()
> be called while restoring from a checkpoint after the unserialize()
> routine is called?
>
> regards,
> Soumyaroop
>
>
> On Fri, Feb 12, 2010 at 12:05 PM, Korey Sewell <ksew...@umich.edu> wrote:
>>> But fixing the two items above did not solve the problem. I figured
>>> (from the takeoverfrom() routines) that commit stage needs to reset
>>> its flags to that it does not go and squash the first instruction
>>> where the restoration is supposed to start from. Since I am not very
>>> familiar with the O3 code, I did not spend much time looking into it.
>>
>> I'm assuming O3 doesnt get to commit 1 instruction, because it's immediately
>> squashed
>> as soon as you restore from checkpoint?
>>
>>
>>>
>>> So, now I am seeing inorder proceed to about a 100 instructions after
>>> which the PC is set to 0x0 (following a squash). I have to look into
>>> it later. Which trace flags should I use to see the actual
>>> instructions?
>>
>> "Exec" if you want just the committed instructions
>>
>> --
>> - Korey
>>
>> _______________________________________________
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>
>
>
> --
> Soumyaroop Roy
> Ph.D. Candidate
> Department of Computer Science and Engineering
> University of South Florida, Tampa
> http://www.csee.usf.edu/~sroy
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] Implementing checkpointing for inorder

Reply via email to