Aren't we are talking about this patch? https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=debian/patches/gluster-backupserver.patch;h=ad241ee1154ebbd536d7c2c7987d86a02255aba2;hb=HEAD
2015-10-26 22:56 GMT+02:00 Niels de Vos <[email protected]>: > On Thu, Oct 22, 2015 at 08:45:04PM +0200, André Bauer wrote: > > Hi, > > > > i have a 4 node Glusterfs 3.5.6 Cluster. > > > > My VM images are in an replicated distributed volume which is accessed > > from kvm/qemu via libgfapi. > > > > Mount is against storage.domain.local which has IPs for all 4 Gluster > > nodes set in DNS. > > > > When one of the Gluster nodes goes down (accidently reboot) a lot of the > > vms getting read only filesystem. Even when the node comes back up. > > > > How can i prevent this? > > I expect that the vm just uses the replicated file on the other node, > > without getting ro fs. > > > > Any hints? > > There are at least two timeouts that are involved in this problem: > > 1. The filesystem in a VM can go read-only when the virtual disk where > the filesystem is located does not respond for a while. > > 2. When a storage server that holds a replica of the virtual disk > becomes unreachable, the Gluster client (qemu+libgfapi) waits for > max. network.ping-timeout seconds before it resumes I/O. > > Once a filesystem in a VM goes read-only, you might be able to fsck and > re-mount it read-writable again. It is not something a VM will do by > itself. > > > The timeouts for (1) are set in sysfs: > > $ cat /sys/block/sda/device/timeout > 30 > > 30 seconds is the default for SD-devices, and for testing you can change > it with an echo: > > # echo 300 > /sys/block/sda/device/timeout > > This is not a peristent change, you can create a udev-rule to apply this > change at bootup. > > Some of the filesystem offer a mount option that can change the > behaviour after a disk error is detected. "man mount" shows the "errors" > option for ext*. Changing this to "continue" is not recommended, "abort" > or "panic" will be the most safe for your data. > > > The timeout mentioned in (2) is for the Gluster Volume, and checked by > the client. When a client does a write to a replicated volume, the write > needs to be acknowledged by both/all replicas. The client (libgfapi) > delays the reply to the application (qemu) until both/all replies from > the replicas has been received. This delay is configured as the volume > option network.ping-timeout (42 seconds by default). > > > Now, if the VM returns block errors after 30 seconds, and the client > waits up to 42 seconds for recovery, there is an issue... So, your > solution could be to increase the timeout for error detection of the > disks inside the VMs, and/or decrease the network.ping-timeout. > > It would be interesting to know if adapting these values prevents the > read-only occurrences in your environment. If you do any testing with > this, please keep me informed about the results. > > Niels > > _______________________________________________ > Gluster-devel mailing list > [email protected] > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Best regards, Roman.
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
