Re: [ofa-general][PATCH 3/4] SRP fail-over faster

David Dillow Wed, 28 Oct 2009 08:09:52 -0700

On Sat, 2009-10-24 at 03:35 -0400, Vu Pham wrote:
> It's a big improvement from 3-5 minutes cutting down to 1s and now you
> talk about device_loss_timeout=0. I'll look at the trade-off to have
> it; however, to receive and process the async event (port error)
> already cost you a fair amount of cycles.


I agree that it is a great improvement over just sending packets blindly
to the link, and waiting for SCSI to time them out -- I've been using
the variant of the patches from OFED -- but it is harder to change
things once they are in the mainstream kernel, so I'd like to see it
done better.

And hey, maybe I'm just overly touchy about this. These should be rare
events, and there's nothing we can do about the commands sent prior to
being told about the link error. It's just that I don't want my file
system to stall the petaflop simulation platforms if I can avoid it --
and there's no reason to send any command down the wire once we've been
told there is no link or the target is not there. Maybe we don't need to
destroy the link immediately, but we need to let the SCSI mid-layer know
that things are failing.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general][PATCH 3/4] SRP fail-over faster

Reply via email to