On Tue, Jul 28, 2020 at 4:58 AM Shubha Kulkarni <[email protected]> wrote: > > Hello, > > In OVirt, we have a property propagate_error at the disk level that > decides in case of an error, how this error be propagated to the VM. > This value is maintained in the database table with the default value > set as Off. The default setting(Off) results in a policy that ends up > pausing the VM rather than propagating the errors to VM. There is no > provision in the UI currently to configure this property for disk > (images or luns). So there is no easy way to set this value. Further, > even if the value is manually set to "On" in db, it gets overwriiten by > UI everytime some other property is updated as described here - > https://bugzilla.redhat.com/show_bug.cgi?id=1669367 > > Setting the value to "Off" is not ideal for multipath devices where a > single path failure causes vm to pause.
Single path failure should be transparent to qemu. multipath will fail over the I/O to another path. The I/O will fail only if all paths are down, and (with the default configuration), multipath path checkers failed 4 times. > It puts serious restrictions for > the DR situation and unlike VMWare * Hyper-V, oVirt is not able to > support the DR functionality - > https://bugzilla.redhat.com/show_bug.cgi?id=1314160 Alghouth in this bug we see that failover that looks successful from multipath and vdsm point of view ended in paused VM: https://bugzilla.redhat.com/1860377 Maybe Ben can explain how this can happen. I hope that qemu will provide more info on errors in the future. If we had a log about the failure I/O it could be helpful. > While we wait for RFE, the proposal here is to revise the out of the box > behavior for LUNs. For LUNs, we should propagate the errors to VM rather > than directly stopping those. This will allow us to handle short-term > multipath outages and improve availability. This is a simple change in > behavior but will have good positive impact. I would like to seek > feedback about this to make sure that everyone is ok with the proposal. I think it makes sense, but this is just a default, and it cannot work for all cases. This can end in broken VM with read only file system that must be rebooted, while with error_policy="stop", failover may be transparent to the VM even if it was paused for a short time. I would start by making engine defaults configurable using engine config, so different oVirt distributions can use different defaults. Nir _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/V3D5DMZ4CU6Y3L7KAJSHBRA27EOSLCAU/
