On 02/15/2012 03:41 AM, Perry Myers wrote:
As long as you expect the VM to enforce reliability on the raw
storage devices then you are going to have problems with restarting
HA VMs. If you switch your thinking to making the storage operations
HA, then all you need is a response cache.
A restarted VM replays the operation, and the cached response is
retransmitted (or the operation is benignly re-applied). Without
defining the operations so that they can be benignly re-applied or
adding a response cache you will always be able to come up with some
order of failure that won't work. There is no cost-effective way to
guarantee that you snapshot the VM only when there is no in-flight
storage activity.
How is this any different than a bare metal host crashing while writes
are in flight either to a local disk or FC disk? When something crashes
(be it physical or virtual) you're always going to lose some data that
was in flight but not committed to disk (network has same issue). It's
up to individual applications to be resilient to this.
I think this issue is somewhat orthogonal to simply providing reduced
MTTR by restarting failed services or VMs.
don't you fence the other node first to make sure it won't write after
you started another one?
here we are talking about moving the VM, without fencing the host.
_______________________________________________
Arch mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/arch