On Sat, 2009-10-24 at 03:35 -0400, Vu Pham wrote: > It's a big improvement from 3-5 minutes cutting down to 1s and now you > talk about device_loss_timeout=0. I'll look at the trade-off to have > it; however, to receive and process the async event (port error) > already cost you a fair amount of cycles.
I agree that it is a great improvement over just sending packets blindly to the link, and waiting for SCSI to time them out -- I've been using the variant of the patches from OFED -- but it is harder to change things once they are in the mainstream kernel, so I'd like to see it done better. And hey, maybe I'm just overly touchy about this. These should be rare events, and there's nothing we can do about the commands sent prior to being told about the link error. It's just that I don't want my file system to stall the petaflop simulation platforms if I can avoid it -- and there's no reason to send any command down the wire once we've been told there is no link or the target is not there. Maybe we don't need to destroy the link immediately, but we need to let the SCSI mid-layer know that things are failing. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
