Re: [gem5-dev] Review Request 2908: ruby: Fix checkpointing and restore

Timothy Jones Wed, 24 Jun 2015 02:37:09 -0700


On June 24, 2015, 7:49 a.m., Timothy Jones wrote:
> > I do not think this is the way to go. There is already an established 
> > methodology to solve the issue.
> 
> Timothy Jones wrote:
>     I don't think draining is the way forward, but there could well be other, 
> better solutions than the one I've got here.  My aim was to get this working 
> again, which is does, but I'm happy now to rework it into something more 
> acceptable.
>     
>     From my understanding, draining is there to remove transient state from 
> objects.  In the classic memory model caches keep dirty data around after 
> draining, so I don't see why ruby shouldn't either.
>     
>     However, in the classic model, a call to memWriteback() is made to flush 
> data back to memory before checkpointing.  Perhaps it would be better if I 
> implemented ruby's flushing in a similar function?  The main problem with 
> that is that ruby's flushing is tied up with creating a trace for 
> checkpointing, so if someone wanted to call memWriteback() without creating 
> then creating a checkpoint, then they'd end up with an unnecessary trace file.
> 
> Andreas Sandberg wrote:
>     I agree with Andreas here. Having checkpoint priorities seems like a 
> really bad idea.
>     
>     You're right that the expectation is that drain does not flush or write 
> back caches. The simulator is drained when switching CPU models (there are 
> cases where we write back and invalidate caches here, but I don't think those 
> are supported by Ruby) and you really don't want to lose your cache in that 
> case.
>     
>     The flow when taking a checkpoint is slightly different since we assume 
> that caches in the classic memory system can't be checkpointed. This means 
> that the simulator goes through the following flow:
>         * drain() (called multiple times until the whole simulator is drained)
>         * memWriteback()
>         * serialize()
>         * drainResume()
>     
>     I *think* the right solution here would be to write back dirty data to 
> the memory system in memWriteback() and create the trace in serialize(). This 
> would ensure that the backing store is in sync with Ruby when you take the 
> checkpoint. Note that you aren't required to change the state of dirty cache 
> lines in memWriteback(), this would allow you to retain the old state in the 
> trace.


I don't think draining priorities are necessarily a bad idea, it's just that we 
want to incorporate these fixes into the existing mechanisms if at all possible.

I think we probably need a ruby person to comment on this.  The issue is that 
the flushing is based on the trace that is generated.  So we can either:

1) Create the trace twice (once in memWriteback() for flushing, then again in 
serialize() for writing out)
2) Create the trace in memWriteback() but keep it hanging around for the later 
serialize() call (but what if this is never called...)

Neither strikes me as an ideal solution, so perhaps I'm missing something that 
someone with better knowledge of ruby can spot.


- Timothy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2908/#review6581
-----------------------------------------------------------


On June 24, 2015, 7:43 a.m., Timothy Jones wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2908/
> -----------------------------------------------------------
> 
> (Updated June 24, 2015, 7:43 a.m.)
> 
> 
> Review request for Default and Ruby Reviewers.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> ruby: Fix checkpointing and restore
> 
> There are 2 problems with the existing checkpoint and restore code in ruby.  
> The first is that the event queue is altered by ruby during serialization, 
> meaning that the event to stop simulation that always lives on the queue 
> can't be found, causing a panic.  This is fixed by explicitly descheduling it 
> before swapping events off the main queue.
> 
> The other happens occasionally when ruby is serialized after the memory 
> system.  In this case the dirty data in ruby's caches is flushed back to 
> memory too late and so isn't included in the checkpoint.  This is fixed by 
> adding serialization priorities, thus ensuring that ruby can be serialized 
> first, perform its flush, and make sure memory has the most up-to-date data 
> before it is checkpointed too.
> 
> 
> Diffs
> -----
> 
>   src/mem/ruby/system/CacheRecorder.cc e4f63f1d502d 
>   src/mem/ruby/system/System.cc e4f63f1d502d 
>   src/sim/sim_object.hh e4f63f1d502d 
>   src/sim/sim_object.cc e4f63f1d502d 
> 
> Diff: http://reviews.gem5.org/r/2908/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Timothy Jones
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2908: ruby: Fix checkpointing and restore

Reply via email to