The clock rate on the server is only set to 1GHz on the checkpoint run (as opposed to 3GHz for the detailed simulation). How slow should it be set? Are we talking nearer to 250MHz?
Thanks, -Rick Ali Saidi wrote: > That's almost certainly what is happening. Different packets are > trying to be sent, both originating from the kernel. This isn't a > device bug. It's exactly what that paper described. The delay observed > by the server has changed dramatically, that a retransmit is occurring > because since the ack didn't arrive in twice the round trip latency. > You should add some latency to the ethernet link, and drive the server > with a slower CPU during the checkpoint run. That will normally fix > the problem. > > Ali > > > > > On Jan 28, 2009, at 5:59 PM, Rick Strong wrote: > > >> This is an interesting. Thanks for the link. >> >> -Rick >> >> Lisa Hsu wrote: >> >>> Your description that it only occurs when you switch to a timing sim >>> makes me think of this (not to toot my own horn or anything): >>> >>> http://www.eecs.umich.edu/~hsul/pubs/mobs05.pdf >>> <http://www.eecs.umich.edu/%7Ehsul/pubs/mobs05.pdf> >>> >>> Just throwing that out as a possibility. You might want to "slow >>> down" your checkpoint dropping run so that it's not so disruptive >>> when >>> you switch over to timing. >>> >>> Lisa >>> >>> On Wed, Jan 28, 2009 at 5:32 PM, Rick Strong <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I have posted tar.gz files that include EthernetAll output (in >>> ethernet_all.trace) @ http://rickshin.ucsd.edu. Once you have a >>> chance, >>> if you could take a look at the trace and figure what is wrong >>> that is >>> great. >>> >>> I ended restoring from the checkpoint for two runs. One run >>> stays in >>> atomic mode while the other switches to timing and detailed. The >>> run >>> that stays in atomic mode works fine. This leads me to believe >>> that the >>> checkpoint restore mechanism is fine. The fault likely lies in the >>> switching to timing or detailed mdoe. >>> >>> The differences between the runs: >>> >>> (1) m5out-atomic-run-aftercheckpoint.tar.gz is a run that stays in >>> atomic mode (no switching to timing mode) >>> >>> (2) m5out-timing-run-aftercheckpoint.tar.gz is a run that >>> switches to >>> timing and then to detailed mode. >>> >>> >>> Thanks and good luck, >>> >>> -Rick >>> >>> Ali Saidi wrote: >>> >>>> Looking at the trace, it appears as though you just restored from a >>>> checkpoint. Is this the case? If so, what does the checkpoint >>>> >>> dropping >>> >>>> run do after that checkpoint is created? It's so early in the trace >>>> that I would guess it's a serialization bug, particularly in >>>> >>> the TSO >>> >>>> code. However, I looked quickly at the code and and nothing >>>> >>> seemed to >>> >>>> jump out at me. If you can provide me with an EthernetAll trace from >>>> the checkpoint run and from the restored run I can work on figuring >>>> out what the problem is. >>>> >>>> Ali >>>> >>>> >>>> >>>> On Jan 28, 2009, at 2:53 AM, Rick Strong wrote: >>>> >>>> >>>> >>>>>> There are three possibilities here: >>>>>> a) A kernel bug >>>>>> b) a device model/driver bug >>>>>> c) a checkpointing bug (as it relates to (b)) >>>>>> >>>>>> What kernel version are you using? Could you put the ethernet >>>>>> >>> trace >>> >>>>>> somewhere so I could look at it? >>>>>> >>>>>> Ali >>>>>> >>>>>> >>>>>> >>>>> I am using the kernel 2.6.18 with M5 patches. >>>>> >>>>> I have put the ethernet traces up at http://rickshin.ucsd.edu. >>>>> >>> It is >>> >>>>> the >>>>> only link. If you take a look, let me know what you think. >>>>> >>>>> Best, >>>>> -Rick >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] <mailto:[email protected]> >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] <mailto:[email protected]> >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
