Hello Pranith, ok, I understand that each time a write operation is performed the update flag is set, and reset afterwards, when update is complete.
I really don't know GlusterFS internals, but... what if a live migration or brick failure happens *while* these updates are ongoing? The problem is, my VMs are definitely *not* doing fine :( My former GlusterFS configuration had only one brick, and everything went perfect. Problems started to arise as soon as we migrated to a replicated infrastructure. I wonder if the problem is: - our network; - some obscure configuration internal to the VMs; - GlusterFS; - a combination of all the above. Since the only thing that changed is our GlusterFS configuration, I'm "pointing the finger" to it: we've put in place such replicated configuration to avoid having single points of failures, *but* we are experiencing more failures since then! To me it seems very related to this issue, if not exactly the same problem, although on a much smaller scale: http://gluster.org/pipermail/gluster-users/2012-September/011444.html Of course the problem might be network-related, I am currently running tests to sort it out. Cheers -- : Dario Berzano : CERN PH-SFT & Università di Torino (Italy) : Wiki: http://newton.ph.unito.it/~berzano : GPG: http://newton.ph.unito.it/~berzano/gpg : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) Il giorno 17/set/2012, alle ore 10:41, Pranith Kumar Karampuri <[email protected]> ha scritto: > Dario, > Nothing to worry then :-). It was a transient state. Every time an update > is done they are marked and after the update is over they are reset. > Similarly the output of 'gluster volume heal <volname> info' Keeps giving > entries when these flags are set and not show any output when these flags are > reset. I thought it was a persistent one. Seems like your Vm files are doing > fine. > > Pranith. > ----- Original Message ----- > From: "Dario Berzano" <[email protected]> > To: "Pranith Kumar Karampuri" <[email protected]> > Cc: "gluster-users" <[email protected]> > Sent: Monday, September 17, 2012 1:36:56 PM > Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS > v3.3 > > Hi Pranith, > > those bricks stay on different servers connected on the same switch: the > only possibility I see is that the switch went down for some reason, it is > our only single point of failure. The servers themselves never went down at > the same time. > > I do not understand however why if I run getfattr continuously: > > watch -n1 'getfattr -d -m . -e hex 1814/images/disk.0' > > I get alternating: > > trusted.afr.VmDir-client-0=0x000000010000000000000000 > trusted.afr.VmDir-client-1=0x000000010000000000000000 > > and: > > trusted.afr.VmDir-client-0=0x000000000000000000000000 > trusted.afr.VmDir-client-1=0x000000000000000000000000 > > This again happens with every "big" file. > > Does this suggest a network problem, maybe? One of the servers has 1 GbE > while the other one has a faster 10 GbE, but I do not think this is enough to > continuously de-synchronize the bricks... > > Cheers > -- > : Dario Berzano > : CERN PH-SFT & Università di Torino (Italy) > : Wiki: http://newton.ph.unito.it/~berzano > : GPG: http://newton.ph.unito.it/~berzano/gpg > : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) > > > > Il giorno 17/set/2012, alle ore 00:11, Pranith Kumar Karampuri > <[email protected]> > ha scritto: > >> 1814/images/disk.0 has pending data change log for both subvolumes. i.e. >> 0x00000001. This happens when both the bricks go out at the same time, while >> an operation is in progress. Did that happen? >> >> Pranith. >> >> ----- Original Message ----- >> From: "Dario Berzano" <[email protected]> >> To: "Pranith Kumar Karampuri" <[email protected]> >> Cc: "gluster-users" <[email protected]> >> Sent: Sunday, September 16, 2012 9:20:23 PM >> Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS >> v3.3 >> >> Ok, here's the output for 1816/images/disk.0: >> >> # file: bricks/VmDir01/1816/images/disk.0 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 >> trusted.afr.VmDir-client-0=0x000000000000000000000000 >> trusted.afr.VmDir-client-1=0x000000000000000000000000 >> trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b >> >> # file: bricks/VmDir02/1816/images/disk.0 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 >> trusted.afr.VmDir-client-0=0x000000000000000000000000 >> trusted.afr.VmDir-client-1=0x000000000000000000000000 >> trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b >> >> And for 1814/images/disk.0: >> >> # file: bricks/VmDir01/1814/images/disk.0 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 >> trusted.afr.VmDir-client-0=0x000000010000000000000000 >> trusted.afr.VmDir-client-1=0x000000010000000000000000 >> trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b >> >> # file: bricks/VmDir02/1814/images/disk.0 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 >> trusted.afr.VmDir-client-0=0x000000010000000000000000 >> trusted.afr.VmDir-client-1=0x000000010000000000000000 >> trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b >> >> Note that these are just two sample files, since the problem occurs with >> 100% of our "big" virtual machines. Here's the whole content of the >> GlusterFS volume along with file sizes: >> >> 6.3G ./1981/images/disk.0 >> 53M ./1820/images/disk.0 >> 9.7G ./1838/images/disk.0 >> 10G ./1819/images/disk.0 >> 9.2G ./1818/images/disk.0 >> 10G ./1816/images/disk.0 >> 53M ./1962/images/disk.0 >> 10G ./1814/images/disk.0 >> 6.2G ./1988/images/disk.0 >> 10G ./1817/images/disk.0 >> 53M ./1821/images/disk.0 >> >> We currently have 11 running VMs. The "small" ones (53 MB big) have never >> shown any problem so far. *All* the other VMs (6 to 10 GB big) periodically >> show up in the output of: >> >> gluster volume heal VmDir info >> >> when there's some intense I/O occuring, disappearing immediately shortly >> afterwards. >> >> Thanks, cheers, >> -- >> : Dario Berzano >> : CERN PH-SFT & Università di Torino (Italy) >> : Wiki: http://newton.ph.unito.it/~berzano >> : GPG: http://newton.ph.unito.it/~berzano/gpg >> : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) >> >> >> Il giorno 14/set/2012, alle ore 18:21, Pranith Kumar Karampuri >> <[email protected]> ha scritto: >> >>> Dario, >>> Ok that confirms that it is not a split-brain. Could you post the getfattr >>> output I requested as well?. What is the size of the VM files?. >>> >>> Pranith >>> ----- Original Message ----- >>> From: "Dario Berzano" <[email protected]> >>> To: "Pranith Kumar Karampuri" <[email protected]> >>> Cc: "<[email protected]>" <[email protected]> >>> Sent: Friday, September 14, 2012 9:42:38 PM >>> Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS >>> v3.3 >>> >>> >>> # gluster volume heal VmDir info healed >>> >>> >>> Heal operation on volume VmDir has been successful >>> >>> >>> Brick one-san-01:/bricks/VmDir01 >>> Number of entries: 259 >>> Segmentation fault (core dumped) >>> >>> >>> (same story for heal-failed) which seems to be exactly this bug: >>> >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=836421 >>> >>> >>> Should I upgrade to latest QA RPMs to see what is going on? >>> >>> >>> Btw, with split-brain I have no entries: >>> >>> >>> >>> Heal operation on volume VmDir has been successful >>> >>> >>> Brick one-san-01:/bricks/VmDir01 >>> Number of entries: 0 >>> >>> >>> Brick one-san-02:/bricks/VmDir02 >>> Number of entries: 0 >>> >>> >>> Thank you, cheers, >>> -- >>> : Dario Berzano >>> : CERN PH-SFT & Università di Torino (Italy) >>> : Wiki: http://newton.ph.unito.it/~berzano >>> : GPG: http://newton.ph.unito.it/~berzano/gpg >>> : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) >>> >>> >>> >>> >>> Il giorno 14/set/2012, alle ore 17:16, Pranith Kumar Karampuri < >>> [email protected] > >>> ha scritto: >>> >>> >>> hi Dario, >>> Could you post the output of the following commands: >>> gluster volume heal VmDir info healed >>> gluster volume heal VmDir info split-brain >>> >>> Also provide the output of 'getfattr -d -m . -e hex' On both the bricks for >>> the two files listed in the output of 'gluster volume heal VmDir info' >>> >>> Pranith. >>> >>> ----- Original Message ----- >>> From: "Dario Berzano" < [email protected] > >>> To: [email protected] >>> Sent: Friday, September 14, 2012 6:57:32 PM >>> Subject: [Gluster-users] Virtual machines and self-healing on GlusterFS >>> v3.3 >>> >>> >>> >>> Hello, >>> >>> >>> in our computing centre we have an infrastructure with a GlusterFS volume >>> made of two bricks in replicated mode: >>> >>> >>> >>> >>> >>> Volume Name: VmDir >>> Type: Replicate >>> Volume ID: 9aab85df-505c-460a-9e5b-381b1bf3c030 >>> Status: Started >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: one-san-01:/bricks/VmDir01 >>> Brick2: one-san-02:/bricks/VmDir02 >>> >>> >>> >>> >>> We are using this volume to store running images of some KVM virtual >>> machines and thought we could benefit from the replicated storage in order >>> to achieve more robustness as well as the ability to live-migrate VMs. >>> >>> >>> Our GlusterFS volume VmDir is mounted on several (three at the moment) >>> hypervisors. >>> >>> >>> However, in many cases (but it is difficult to reproduce: best way is to >>> stress VM I/O), either when one brick becomes unavailable for some reason, >>> or when we perform live migrations, virtual machines decide to remount >>> filesystems from their virtual disks in read-only. At the same time, on the >>> hypervisors mounting the GlusterFS partitions, we spot some kernel messages >>> like: >>> >>> >>> >>> >>> INFO: task kvm:13560 blocked for more than 120 seconds. >>> >>> >>> >>> >>> By googling it I have found some "workarounds" to mitigate this problem, >>> like mounting disks within virtual machines with barrier=0: >>> >>> >>> http://invalidlogic.com/2012/04/28/ubuntu-precise-on-xenserver-disk-errors/ >>> >>> >>> but I actually fear to damage my virtual machine disks by doing such a >>> thing! >>> >>> >>> AFAIK from GlusterFS v3.3 self-healing should be performed server-side (and >>> no self-healing at all is performed on the clients and by granularly >>> locking big files). When I connect to my GlusterFS pool, if I monitor the >>> self-healing status continuously: >>> >>> >>> watch -n1 'gluster volume heal VmDir info' >>> >>> >>> I obtain an output like: >>> >>> >>> >>> >>> >>> Heal operation on volume VmDir has been successful >>> >>> >>> Brick one-san-01:/bricks/VmDir01 >>> Number of entries: 2 >>> /1814/images/disk.0 >>> /1816/images/disk.0 >>> >>> >>> Brick one-san-02:/bricks/VmDir02 >>> Number of entries: 2 >>> /1816/images/disk.0 >>> /1814/images/disk.0 >>> >>> >>> >>> >>> with a list of virtual machine disks healed by GlusterFS. Those and other >>> files continuously appear and disappear from the list. >>> >>> >>> This is a behavior I don't understand at all: does this mean that those >>> files continuously get corrupted and healed, and self-healing is just a >>> natural part of the replication process?! Or some kind of corruption is >>> actually happening on our virtual disks for some reason? Is this related to >>> the "remount readonly" problem? >>> >>> >>> A more general question maybe would be: is GlusterFS v3.3 ready for storing >>> running virtual machines (and is there some special configuration option >>> needed on the volumes and clients for that)? >>> >>> Thank you in advance for shedding some light... >>> >>> >>> Regards, >>> >>> -- >>> : Dario Berzano >>> : CERN PH-SFT & Università di Torino (Italy) >>> : Wiki: http://newton.ph.unito.it/~berzano >>> : GPG: http://newton.ph.unito.it/~berzano/gpg >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
