Andy Doan <[email protected]> writes: > On 11/09/2012 07:37 AM, Dave Pigott wrote: >> Ignoring the one failure while I was getting tc2 up and running, we have the >> following: >> >> ------------ >> panda06 >> ------------ >> http://validation.linaro.org/lava-server/scheduler/job/38176 >> >> The key part is in downloading root.tgz. It gets part way through and then >> we get "connection reset by peer" on every single retry until we fail. >> >> I've put it back online to retest. > > I think this is now our #1 failure issue in LAVA. We've looked at this > in the past, added debugging, made hypothesis. However, we really > haven't gotten to the bottom of this. > > One data point I can add. When this happens, I've logged onto control > and run wget on the failed URL and it works. So, this doesn't appear to > be related to Apache or server load. I *think* I've also done wget's > from another system in the lab. So, I don't think its a network/router > thing either.
Thanks for checking this. My gut already said "duff networking in the master image" but nice to have some data. > We already have some retry logic there, but maybe we need something more > sophisticated? (my gut says "no") Well. Rebooting the master image would almost certainly fix it. Don't know how to detect when that is the thing to do though. Interesting that we didn't see this on staging at all -- is it concentrated on particular boards? It might have a hardware aspect. Cheers, mwh _______________________________________________ linaro-validation mailing list [email protected] http://lists.linaro.org/mailman/listinfo/linaro-validation
