On Wed, 2019-04-24 at 13:58 -0700, Sagi Grimberg wrote:
> > It isn't that the media is slow; the max timeout is based on the SLA
> > for certain classes of "fabric" outages. Linux copes *really* badly
> > with I/O errors, and if we can make the timeout last long enough to
> > cover the switch restart worst case, then users are a lot happier.
> 
> Well, what is usually done to handle fabric outages is having multiple
> paths to the storage device, not sure if that is applicable for you or
> not...

Yeah, that turns out to be impractical in this case.

> What do you mean by "Linux copes *really* badly with I/O errors"? What
> can be done better?

There's not a lot that can be done here in the short term. If file
systems get errors on certain I/O, then graceful recovery would be
complicated to achieve.

Better for the I/O timeout to be set higher than the known worst case
time for successful completion.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to