Hi Haiyang, Checkpointing and Ruby hasn't been tested in a while, as far as I know. I guess what I mean by that is that I haven't used the feature in a long time, and I don't know of anyone that tracks mainline gem5 that does use Ruby and checkpointing :).
When taking a checkpoint, you have to use a protocol that implements the "FLUSH" command. I believe that only MOESI_hammer implements this command. I could be wrong, though. Can you checkpoint with MOESI_hammer and restore with MOESI_hammer? If you're getting deadlocks during restore it's probably a real deadlock. FIFO ordering violations also seem like something you can't safely ignore. Are you using the same topology for checkpoint and restore? Ruby uses a order-specific list to save and restore the controllers. So, if you restore with a different number of directories, it's possible this perturbs the controller list so that the L1 caches are in a different place in the list. This might break restore. It's not very robust. I would suggest using some debug flags to try to track down the problem (RubyCacheTrace is the flag you want to print info while restoring, also possibly ProtocolTrace). You've likely found a bug. If you can track it down and post a fix to gerrit, we'd appreciate it! Let me know if you run into any other questions. Jason On Fri, Mar 16, 2018 at 1:45 PM Haiyang Han < [email protected]> wrote: > Hi all, > > I'm trying to create and restore checkpoints with ruby while simulating a > 16-core O3CPU, full system, x86 configuration. I can create the checkpoints > with no problem, but a little while after restoring from the checkpoints, I > am seeing all sorts of gem5 aborts due to panics. Sometime it complains > about a possible deadlock, other times it complains that FIFO ordering is > violated. Below is the ruby protocols I tried: > > *Protocol used to write checkpoint: Protocol used to restore:* > MOESI_hammer MESI_Two_level > MOESI_hammer MESI_Three_level > MESI_Two_level MESI_Two_level > MESI_Three_level MESI_Three_level > > I read on http://gem5.org/Checkpoints that only MOESI_hammer supports the > writing of checkpoints. Does this still apply to the newest gem5 versions? > Could it be that the traffic generated by the 16 cores is too much for the > ruby system to handle correctly? Is it possible to solve the deadlock issue > by manually increasing a threshold somewhere in the source code? What about > the FIFO ordering violation? It'll be great if any of these are answered :D > > Thanks! > Haiyang > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
