On Tue, Jul 28, 2020 at 4:58 AM Shubha Kulkarni
<[email protected]> wrote:
>
> Hello,
>
> In OVirt, we have a property propagate_error at the disk level that
> decides in case of an error, how this error be propagated to the VM.
> This value is maintained in the database table with the default value
> set as Off. The default setting(Off) results in a policy that ends up
> pausing the VM rather than propagating the errors to VM.  There is no
> provision in the UI currently to configure this property for disk
> (images or luns). So there is no easy way to set this value.  Further,
> even if the value is manually set to "On" in db, it gets overwriiten by
> UI everytime some other property is updated as described here -
> https://bugzilla.redhat.com/show_bug.cgi?id=1669367
>
> Setting the value to "Off" is not ideal for multipath devices where a
> single path failure causes vm to pause.

Single path failure should be transparent to qemu. multipath will fail over
the I/O to another path. The I/O will fail only if all paths are down, and
(with the default configuration), multipath path checkers failed 4 times.

> It puts serious restrictions for
> the DR situation and unlike VMWare * Hyper-V, oVirt is not able to
> support the DR functionality -
> https://bugzilla.redhat.com/show_bug.cgi?id=1314160

Alghouth in this bug we see that failover that looks successful from multipath
and vdsm point of view ended in paused VM:
https://bugzilla.redhat.com/1860377

Maybe Ben can explain how this can happen.

I hope that qemu will provide more info on errors in the future. If we had a log
about the failure I/O it could be helpful.

> While we wait for RFE, the proposal here is to revise the out of the box
> behavior for LUNs. For LUNs, we should propagate the errors to VM rather
> than directly stopping those. This will allow us to handle short-term
> multipath outages and improve availability. This is a simple change in
> behavior but will have good positive impact. I would like to seek
> feedback about this to make sure that everyone is ok with the proposal.

I think it makes sense, but this is just a default, and it cannot work
for all cases.

This can end in broken VM with read only file system that must be
rebooted, while
with error_policy="stop", failover may be transparent to the VM even
if it was paused
for a short time.

I would start by making engine defaults configurable using engine
config, so different
oVirt distributions can use different defaults.

Nir
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/V3D5DMZ4CU6Y3L7KAJSHBRA27EOSLCAU/

Reply via email to