Lisa Hsu wrote:
> Rick,
>
> 1) Just to follow up, did you figure out a good speed for the 
> checkpoint producing run?
I guess my situation is a bit complicated as I am using asymmetric 
hardware (many configurations) at different frequencies and complexity. 
I confirmed that there indeed was a slow down happening due to a packet 
that timed out on the server side. Your advice was on target. Thanks. My 
solution was to add  a script with different ethernet link delays from 
100us to 10ms and to find a point where greater stability occurred 
across all the hardware configurations.
>
> 2) Did you ever find why everything was 404'ed?
This issue has not been resolved.  The initial suggestions was the 
problem was caused by:

bad
> mod_specweb99.so

However, talking with Ali, it appears that Apache was not fixed. I opted to use 
to the version that sends 404 responses as it still has requests and 
transferring (albeit requiring greater description if used). The last 
conversation with Ali suggested that he was moving towards lighttpd, but that 
license issues prevent its general release to the M5 community. A snippet is 
copied below.


 "I created a lighttpd one that transfers large files only, but it never 
made it into M5 (nor can it because of licensing issues from the 
components it's based on). I never fixed apache."



Ali
>
> Lisa
>
> On Wed, Jan 28, 2009 at 9:44 PM, Lisa Hsu <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     I couldn't say exactly, so much in M5 has changed since that paper
>     to give an exact number, but I'd imagine whatever
>     instructions/second you're getting in the detailed, if you make
>     the checkpoint run the appropriate speed considering its 1 IPC,
>     you'd probably be in the right range.
>
>     Lisa
>
>
>     On Wed, Jan 28, 2009 at 8:26 PM, Rick Strong <[email protected]
>     <mailto:[email protected]>> wrote:
>
>         The clock rate on the server is only set to 1GHz on the
>         checkpoint run
>         (as opposed to 3GHz for the detailed simulation). How slow
>         should it be
>         set? Are we talking nearer to 250MHz?
>
>         Thanks,
>         -Rick
>
>         Ali Saidi wrote:
>         > That's almost certainly what is happening. Different packets are
>         > trying to be sent, both originating from the kernel. This
>         isn't a
>         > device bug. It's exactly what that paper described. The
>         delay observed
>         > by the server has changed dramatically, that a retransmit is
>         occurring
>         > because since the ack didn't arrive in twice the round trip
>         latency.
>         > You should add some latency to the ethernet link, and drive
>         the server
>         > with a slower CPU during the checkpoint run. That will
>         normally fix
>         > the problem.
>         >
>         > Ali
>         >
>         >
>         >
>         >
>         > On Jan 28, 2009, at 5:59 PM, Rick Strong wrote:
>         >
>         >
>         >> This is an interesting. Thanks for the link.
>         >>
>         >> -Rick
>         >>
>         >> Lisa Hsu wrote:
>         >>
>         >>> Your description that it only occurs when you switch to a
>         timing sim
>         >>> makes me think of this (not to toot my own horn or anything):
>         >>>
>         >>> http://www.eecs.umich.edu/~hsul/pubs/mobs05.pdf
>         <http://www.eecs.umich.edu/%7Ehsul/pubs/mobs05.pdf>
>         >>> <http://www.eecs.umich.edu/%7Ehsul/pubs/mobs05.pdf>
>         >>>
>         >>> Just throwing that out as a possibility.  You might want
>         to "slow
>         >>> down" your checkpoint dropping run so that it's not so
>         disruptive
>         >>> when
>         >>> you switch over to timing.
>         >>>
>         >>> Lisa
>         >>>
>         >>> On Wed, Jan 28, 2009 at 5:32 PM, Rick Strong
>         <[email protected] <mailto:[email protected]>
>         >>> <mailto:[email protected] <mailto:[email protected]>>>
>         wrote:
>         >>>
>         >>>    I have posted tar.gz files that include EthernetAll
>         output (in
>         >>>    ethernet_all.trace) @ http://rickshin.ucsd.edu. Once
>         you have a
>         >>>    chance,
>         >>>    if you could take a look at the trace and figure what
>         is wrong
>         >>> that is
>         >>>    great.
>         >>>
>         >>>    I ended restoring from the checkpoint for two runs. One run
>         >>> stays in
>         >>>    atomic mode while the other switches to timing and
>         detailed. The
>         >>> run
>         >>>    that stays in atomic mode works fine. This leads me to
>         believe
>         >>>    that the
>         >>>    checkpoint restore mechanism is fine. The fault likely
>         lies in the
>         >>>    switching to timing or detailed mdoe.
>         >>>
>         >>>    The differences between the runs:
>         >>>
>         >>>    (1) m5out-atomic-run-aftercheckpoint.tar.gz is a run
>         that stays in
>         >>>    atomic mode (no switching to timing mode)
>         >>>
>         >>>    (2) m5out-timing-run-aftercheckpoint.tar.gz is a run that
>         >>> switches to
>         >>>    timing and then to detailed mode.
>         >>>
>         >>>
>         >>>    Thanks and good luck,
>         >>>
>         >>>    -Rick
>         >>>
>         >>>    Ali Saidi wrote:
>         >>>
>         >>>> Looking at the trace, it appears as though you just
>         restored from a
>         >>>> checkpoint. Is this the case? If so, what does the checkpoint
>         >>>>
>         >>>    dropping
>         >>>
>         >>>> run do after that checkpoint is created? It's so early in
>         the trace
>         >>>> that  I would guess it's a serialization bug, particularly in
>         >>>>
>         >>>    the TSO
>         >>>
>         >>>> code. However, I looked quickly at the code and and nothing
>         >>>>
>         >>>    seemed to
>         >>>
>         >>>> jump out at me. If you can provide me with an EthernetAll
>         trace from
>         >>>> the checkpoint run and from the restored run I can work
>         on figuring
>         >>>> out what the problem is.
>         >>>>
>         >>>> Ali
>         >>>>
>         >>>>
>         >>>>
>         >>>> On Jan 28, 2009, at 2:53 AM, Rick Strong wrote:
>         >>>>
>         >>>>
>         >>>>
>         >>>>>> There are three possibilities here:
>         >>>>>> a) A kernel bug
>         >>>>>> b) a device model/driver bug
>         >>>>>> c) a checkpointing bug (as it relates to (b))
>         >>>>>>
>         >>>>>> What kernel version are you using? Could you put the
>         ethernet
>         >>>>>>
>         >>>    trace
>         >>>
>         >>>>>> somewhere so I could look at it?
>         >>>>>>
>         >>>>>> Ali
>         >>>>>>
>         >>>>>>
>         >>>>>>
>         >>>>> I am using the  kernel 2.6.18 with M5 patches.
>         >>>>>
>         >>>>> I have put the ethernet traces up at
>         http://rickshin.ucsd.edu.
>         >>>>>
>         >>>    It is
>         >>>
>         >>>>> the
>         >>>>> only link. If you take a look, let me know what you think.
>         >>>>>
>         >>>>> Best,
>         >>>>> -Rick
>         >>>>>
>         >>>>>
>         >>>>> _______________________________________________
>         >>>>> m5-dev mailing list
>         >>>>> [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>         >>>>> http://m5sim.org/mailman/listinfo/m5-dev
>         >>>>>
>         >>>>>
>         >>>>>
>         >>>> _______________________________________________
>         >>>> m5-dev mailing list
>         >>>> [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>         >>>> http://m5sim.org/mailman/listinfo/m5-dev
>         >>>>
>         >>>>
>         >>>>
>         >>>    _______________________________________________
>         >>>    m5-dev mailing list
>         >>>    [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>         >>>    http://m5sim.org/mailman/listinfo/m5-dev
>         >>>
>         >>>
>         >>>
>         
> ------------------------------------------------------------------------
>         >>>
>         >>> _______________________________________________
>         >>> m5-dev mailing list
>         >>> [email protected] <mailto:[email protected]>
>         >>> http://m5sim.org/mailman/listinfo/m5-dev
>         >>>
>         >>>
>         >> _______________________________________________
>         >> m5-dev mailing list
>         >> [email protected] <mailto:[email protected]>
>         >> http://m5sim.org/mailman/listinfo/m5-dev
>         >>
>         >>
>         >
>         > _______________________________________________
>         > m5-dev mailing list
>         > [email protected] <mailto:[email protected]>
>         > http://m5sim.org/mailman/listinfo/m5-dev
>         >
>         >
>
>         _______________________________________________
>         m5-dev mailing list
>         [email protected] <mailto:[email protected]>
>         http://m5sim.org/mailman/listinfo/m5-dev
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to