--On 20 September 2017 at 12:44:18 +0100 Roger Pau Monné <roger....@citrix.com> wrote:

Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant
of the I/O delays that occur during a storage fail over?

Do you know whether the VMs saw the disks disconnecting and then
connecting again?

I can't see any evidence the drives actually get 'disconnected' from the VM's point of view. Plenty of I/O errors - but no "device destroyed" type stuff.

I have seen that kind of error logged on our test kit - when deliberately failed non-HA storage, but I don't see it this time.

Hm, I have the feeling that part of the problem is that in-flight
requests are basically lost when a disconnect/reconnect happens.

So if a disconnect doesn't happen (as it appears it isn't) - is there any tunable to set the I/O timeout?

'sysctl -a | grep timeout' finds things like:

 kern.cam.ada.default_timeout=30

I might see if that has any effect (from memory - as I'm out of the office now - it did seem to be about 30 seconds before the VM's started logging I/O related errors to the console).

As it's a pure test setup - I can try adjusting this without fear of breaking anything :)

Though I'm open to other suggestions...

fwiw - Who's responsibility is it to re-send lost "in flight" data, e.g. if a write is 'in flight' when an I/O error occurs in the lower layers of XenServer is it XenServers responsibility to retry that - before giving up, or does it just push the error straight back to the VM - expecting the VM to retry it? [or a bit of both?] - just curious.

-Karl


_______________________________________________
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Reply via email to