That's almost certainly what is happening. Different packets are trying to be sent, both originating from the kernel. This isn't a device bug. It's exactly what that paper described. The delay observed by the server has changed dramatically, that a retransmit is occurring because since the ack didn't arrive in twice the round trip latency. You should add some latency to the ethernet link, and drive the server with a slower CPU during the checkpoint run. That will normally fix the problem.
Ali On Jan 28, 2009, at 5:59 PM, Rick Strong wrote: > This is an interesting. Thanks for the link. > > -Rick > > Lisa Hsu wrote: >> Your description that it only occurs when you switch to a timing sim >> makes me think of this (not to toot my own horn or anything): >> >> http://www.eecs.umich.edu/~hsul/pubs/mobs05.pdf >> <http://www.eecs.umich.edu/%7Ehsul/pubs/mobs05.pdf> >> >> Just throwing that out as a possibility. You might want to "slow >> down" your checkpoint dropping run so that it's not so disruptive >> when >> you switch over to timing. >> >> Lisa >> >> On Wed, Jan 28, 2009 at 5:32 PM, Rick Strong <[email protected] >> <mailto:[email protected]>> wrote: >> >> I have posted tar.gz files that include EthernetAll output (in >> ethernet_all.trace) @ http://rickshin.ucsd.edu. Once you have a >> chance, >> if you could take a look at the trace and figure what is wrong >> that is >> great. >> >> I ended restoring from the checkpoint for two runs. One run >> stays in >> atomic mode while the other switches to timing and detailed. The >> run >> that stays in atomic mode works fine. This leads me to believe >> that the >> checkpoint restore mechanism is fine. The fault likely lies in the >> switching to timing or detailed mdoe. >> >> The differences between the runs: >> >> (1) m5out-atomic-run-aftercheckpoint.tar.gz is a run that stays in >> atomic mode (no switching to timing mode) >> >> (2) m5out-timing-run-aftercheckpoint.tar.gz is a run that >> switches to >> timing and then to detailed mode. >> >> >> Thanks and good luck, >> >> -Rick >> >> Ali Saidi wrote: >>> Looking at the trace, it appears as though you just restored from a >>> checkpoint. Is this the case? If so, what does the checkpoint >> dropping >>> run do after that checkpoint is created? It's so early in the trace >>> that I would guess it's a serialization bug, particularly in >> the TSO >>> code. However, I looked quickly at the code and and nothing >> seemed to >>> jump out at me. If you can provide me with an EthernetAll trace from >>> the checkpoint run and from the restored run I can work on figuring >>> out what the problem is. >>> >>> Ali >>> >>> >>> >>> On Jan 28, 2009, at 2:53 AM, Rick Strong wrote: >>> >>> >>>>> There are three possibilities here: >>>>> a) A kernel bug >>>>> b) a device model/driver bug >>>>> c) a checkpointing bug (as it relates to (b)) >>>>> >>>>> What kernel version are you using? Could you put the ethernet >> trace >>>>> somewhere so I could look at it? >>>>> >>>>> Ali >>>>> >>>>> >>>> I am using the kernel 2.6.18 with M5 patches. >>>> >>>> I have put the ethernet traces up at http://rickshin.ucsd.edu. >> It is >>>> the >>>> only link. If you take a look, let me know what you think. >>>> >>>> Best, >>>> -Rick >>>> >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] <mailto:[email protected]> >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] <mailto:[email protected]> >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/mailman/listinfo/m5-dev >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
