On Wed, 2019-04-24 at 13:58 -0700, Sagi Grimberg wrote: > > It isn't that the media is slow; the max timeout is based on the SLA > > for certain classes of "fabric" outages. Linux copes *really* badly > > with I/O errors, and if we can make the timeout last long enough to > > cover the switch restart worst case, then users are a lot happier. > > Well, what is usually done to handle fabric outages is having multiple > paths to the storage device, not sure if that is applicable for you or > not...
Yeah, that turns out to be impractical in this case. > What do you mean by "Linux copes *really* badly with I/O errors"? What > can be done better? There's not a lot that can be done here in the short term. If file systems get errors on certain I/O, then graceful recovery would be complicated to achieve. Better for the I/O timeout to be set higher than the known worst case time for successful completion.
smime.p7s
Description: S/MIME cryptographic signature

