That's almost certainly what is happening. Different packets are  
trying to be sent, both originating from the kernel. This isn't a  
device bug. It's exactly what that paper described. The delay observed  
by the server has changed dramatically, that a retransmit is occurring  
because since the ack didn't arrive in twice the round trip latency.  
You should add some latency to the ethernet link, and drive the server  
with a slower CPU during the checkpoint run. That will normally fix  
the problem.

Ali




On Jan 28, 2009, at 5:59 PM, Rick Strong wrote:

> This is an interesting. Thanks for the link.
>
> -Rick
>
> Lisa Hsu wrote:
>> Your description that it only occurs when you switch to a timing sim
>> makes me think of this (not to toot my own horn or anything):
>>
>> http://www.eecs.umich.edu/~hsul/pubs/mobs05.pdf
>> <http://www.eecs.umich.edu/%7Ehsul/pubs/mobs05.pdf>
>>
>> Just throwing that out as a possibility.  You might want to "slow
>> down" your checkpoint dropping run so that it's not so disruptive  
>> when
>> you switch over to timing.
>>
>> Lisa
>>
>> On Wed, Jan 28, 2009 at 5:32 PM, Rick Strong <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>    I have posted tar.gz files that include EthernetAll output (in
>>    ethernet_all.trace) @ http://rickshin.ucsd.edu. Once you have a
>>    chance,
>>    if you could take a look at the trace and figure what is wrong  
>> that is
>>    great.
>>
>>    I ended restoring from the checkpoint for two runs. One run  
>> stays in
>>    atomic mode while the other switches to timing and detailed. The  
>> run
>>    that stays in atomic mode works fine. This leads me to believe
>>    that the
>>    checkpoint restore mechanism is fine. The fault likely lies in the
>>    switching to timing or detailed mdoe.
>>
>>    The differences between the runs:
>>
>>    (1) m5out-atomic-run-aftercheckpoint.tar.gz is a run that stays in
>>    atomic mode (no switching to timing mode)
>>
>>    (2) m5out-timing-run-aftercheckpoint.tar.gz is a run that  
>> switches to
>>    timing and then to detailed mode.
>>
>>
>>    Thanks and good luck,
>>
>>    -Rick
>>
>>    Ali Saidi wrote:
>>> Looking at the trace, it appears as though you just restored from a
>>> checkpoint. Is this the case? If so, what does the checkpoint
>>    dropping
>>> run do after that checkpoint is created? It's so early in the trace
>>> that  I would guess it's a serialization bug, particularly in
>>    the TSO
>>> code. However, I looked quickly at the code and and nothing
>>    seemed to
>>> jump out at me. If you can provide me with an EthernetAll trace from
>>> the checkpoint run and from the restored run I can work on figuring
>>> out what the problem is.
>>>
>>> Ali
>>>
>>>
>>>
>>> On Jan 28, 2009, at 2:53 AM, Rick Strong wrote:
>>>
>>>
>>>>> There are three possibilities here:
>>>>> a) A kernel bug
>>>>> b) a device model/driver bug
>>>>> c) a checkpointing bug (as it relates to (b))
>>>>>
>>>>> What kernel version are you using? Could you put the ethernet
>>    trace
>>>>> somewhere so I could look at it?
>>>>>
>>>>> Ali
>>>>>
>>>>>
>>>> I am using the  kernel 2.6.18 with M5 patches.
>>>>
>>>> I have put the ethernet traces up at http://rickshin.ucsd.edu.
>>    It is
>>>> the
>>>> only link. If you take a look, let me know what you think.
>>>>
>>>> Best,
>>>> -Rick
>>>>
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>
>>    _______________________________________________
>>    m5-dev mailing list
>>    [email protected] <mailto:[email protected]>
>>    http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to