Perry Myers wrote: >> As long as you expect the VM to enforce reliability on the raw storage >> devices then you are going to have problems with restarting HA VMs. If >> you switch your thinking to making the storage operations HA, then all >> you need is a response cache. >> >> A restarted VM replays the operation, and the cached response is >> retransmitted (or the operation is benignly re-applied). Without >> defining the operations so that they can be benignly re-applied or >> adding a response cache you will always be able to come up with some >> order of failure that won't work. There is no cost-effective way to >> guarantee that you snapshot the VM only when there is no in-flight >> storage activity.
> How is this any different than a bare metal host crashing while writes are > in flight either to a local disk or FC disk? When something crashes (be it > physical or virtual) you're always going to lose some data that was in flight > but not committed to disk (network has same issue). It's up to individual > applications to be resilient to this. Don't think of a storage write as being a write to a device. It is a request to a service made in the context of a session. The session protocol includes the necessary logic to complete the transaction even when a TCP connection is broken. Examples of this include multi-connection iSCSI and NFSv4. Both of which can be used to back a virtual disk. When a VM is migrated you break the connections by it or were made on its behalf. The pre-existing session logic will make in-progress operations retry until they are successful. The key is thinking of block storage as a service, rather than as a device. _______________________________________________ Arch mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/arch
