Hi Pranith, those bricks stay on different servers connected on the same switch: the only possibility I see is that the switch went down for some reason, it is our only single point of failure. The servers themselves never went down at the same time.
I do not understand however why if I run getfattr continuously: watch -n1 'getfattr -d -m . -e hex 1814/images/disk.0' I get alternating: trusted.afr.VmDir-client-0=0x000000010000000000000000 trusted.afr.VmDir-client-1=0x000000010000000000000000 and: trusted.afr.VmDir-client-0=0x000000000000000000000000 trusted.afr.VmDir-client-1=0x000000000000000000000000 This again happens with every "big" file. Does this suggest a network problem, maybe? One of the servers has 1 GbE while the other one has a faster 10 GbE, but I do not think this is enough to continuously de-synchronize the bricks... Cheers -- : Dario Berzano : CERN PH-SFT & Università di Torino (Italy) : Wiki: http://newton.ph.unito.it/~berzano : GPG: http://newton.ph.unito.it/~berzano/gpg : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) Il giorno 17/set/2012, alle ore 00:11, Pranith Kumar Karampuri <[email protected]> ha scritto: > 1814/images/disk.0 has pending data change log for both subvolumes. i.e. > 0x00000001. This happens when both the bricks go out at the same time, while > an operation is in progress. Did that happen? > > Pranith. > > ----- Original Message ----- > From: "Dario Berzano" <[email protected]> > To: "Pranith Kumar Karampuri" <[email protected]> > Cc: "gluster-users" <[email protected]> > Sent: Sunday, September 16, 2012 9:20:23 PM > Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS > v3.3 > > Ok, here's the output for 1816/images/disk.0: > > # file: bricks/VmDir01/1816/images/disk.0 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > trusted.afr.VmDir-client-0=0x000000000000000000000000 > trusted.afr.VmDir-client-1=0x000000000000000000000000 > trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b > > # file: bricks/VmDir02/1816/images/disk.0 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > trusted.afr.VmDir-client-0=0x000000000000000000000000 > trusted.afr.VmDir-client-1=0x000000000000000000000000 > trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b > > And for 1814/images/disk.0: > > # file: bricks/VmDir01/1814/images/disk.0 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > trusted.afr.VmDir-client-0=0x000000010000000000000000 > trusted.afr.VmDir-client-1=0x000000010000000000000000 > trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b > > # file: bricks/VmDir02/1814/images/disk.0 > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > trusted.afr.VmDir-client-0=0x000000010000000000000000 > trusted.afr.VmDir-client-1=0x000000010000000000000000 > trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b > > Note that these are just two sample files, since the problem occurs with 100% > of our "big" virtual machines. Here's the whole content of the GlusterFS > volume along with file sizes: > > 6.3G ./1981/images/disk.0 > 53M ./1820/images/disk.0 > 9.7G ./1838/images/disk.0 > 10G ./1819/images/disk.0 > 9.2G ./1818/images/disk.0 > 10G ./1816/images/disk.0 > 53M ./1962/images/disk.0 > 10G ./1814/images/disk.0 > 6.2G ./1988/images/disk.0 > 10G ./1817/images/disk.0 > 53M ./1821/images/disk.0 > > We currently have 11 running VMs. The "small" ones (53 MB big) have never > shown any problem so far. *All* the other VMs (6 to 10 GB big) periodically > show up in the output of: > > gluster volume heal VmDir info > > when there's some intense I/O occuring, disappearing immediately shortly > afterwards. > > Thanks, cheers, > -- > : Dario Berzano > : CERN PH-SFT & Università di Torino (Italy) > : Wiki: http://newton.ph.unito.it/~berzano > : GPG: http://newton.ph.unito.it/~berzano/gpg > : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) > > > Il giorno 14/set/2012, alle ore 18:21, Pranith Kumar Karampuri > <[email protected]> ha scritto: > >> Dario, >> Ok that confirms that it is not a split-brain. Could you post the getfattr >> output I requested as well?. What is the size of the VM files?. >> >> Pranith >> ----- Original Message ----- >> From: "Dario Berzano" <[email protected]> >> To: "Pranith Kumar Karampuri" <[email protected]> >> Cc: "<[email protected]>" <[email protected]> >> Sent: Friday, September 14, 2012 9:42:38 PM >> Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS >> v3.3 >> >> >> # gluster volume heal VmDir info healed >> >> >> Heal operation on volume VmDir has been successful >> >> >> Brick one-san-01:/bricks/VmDir01 >> Number of entries: 259 >> Segmentation fault (core dumped) >> >> >> (same story for heal-failed) which seems to be exactly this bug: >> >> >> https://bugzilla.redhat.com/show_bug.cgi?id=836421 >> >> >> Should I upgrade to latest QA RPMs to see what is going on? >> >> >> Btw, with split-brain I have no entries: >> >> >> >> Heal operation on volume VmDir has been successful >> >> >> Brick one-san-01:/bricks/VmDir01 >> Number of entries: 0 >> >> >> Brick one-san-02:/bricks/VmDir02 >> Number of entries: 0 >> >> >> Thank you, cheers, >> -- >> : Dario Berzano >> : CERN PH-SFT & Università di Torino (Italy) >> : Wiki: http://newton.ph.unito.it/~berzano >> : GPG: http://newton.ph.unito.it/~berzano/gpg >> : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) >> >> >> >> >> Il giorno 14/set/2012, alle ore 17:16, Pranith Kumar Karampuri < >> [email protected] > >> ha scritto: >> >> >> hi Dario, >> Could you post the output of the following commands: >> gluster volume heal VmDir info healed >> gluster volume heal VmDir info split-brain >> >> Also provide the output of 'getfattr -d -m . -e hex' On both the bricks for >> the two files listed in the output of 'gluster volume heal VmDir info' >> >> Pranith. >> >> ----- Original Message ----- >> From: "Dario Berzano" < [email protected] > >> To: [email protected] >> Sent: Friday, September 14, 2012 6:57:32 PM >> Subject: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3 >> >> >> >> Hello, >> >> >> in our computing centre we have an infrastructure with a GlusterFS volume >> made of two bricks in replicated mode: >> >> >> >> >> >> Volume Name: VmDir >> Type: Replicate >> Volume ID: 9aab85df-505c-460a-9e5b-381b1bf3c030 >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: one-san-01:/bricks/VmDir01 >> Brick2: one-san-02:/bricks/VmDir02 >> >> >> >> >> We are using this volume to store running images of some KVM virtual >> machines and thought we could benefit from the replicated storage in order >> to achieve more robustness as well as the ability to live-migrate VMs. >> >> >> Our GlusterFS volume VmDir is mounted on several (three at the moment) >> hypervisors. >> >> >> However, in many cases (but it is difficult to reproduce: best way is to >> stress VM I/O), either when one brick becomes unavailable for some reason, >> or when we perform live migrations, virtual machines decide to remount >> filesystems from their virtual disks in read-only. At the same time, on the >> hypervisors mounting the GlusterFS partitions, we spot some kernel messages >> like: >> >> >> >> >> INFO: task kvm:13560 blocked for more than 120 seconds. >> >> >> >> >> By googling it I have found some "workarounds" to mitigate this problem, >> like mounting disks within virtual machines with barrier=0: >> >> >> http://invalidlogic.com/2012/04/28/ubuntu-precise-on-xenserver-disk-errors/ >> >> >> but I actually fear to damage my virtual machine disks by doing such a >> thing! >> >> >> AFAIK from GlusterFS v3.3 self-healing should be performed server-side (and >> no self-healing at all is performed on the clients and by granularly locking >> big files). When I connect to my GlusterFS pool, if I monitor the >> self-healing status continuously: >> >> >> watch -n1 'gluster volume heal VmDir info' >> >> >> I obtain an output like: >> >> >> >> >> >> Heal operation on volume VmDir has been successful >> >> >> Brick one-san-01:/bricks/VmDir01 >> Number of entries: 2 >> /1814/images/disk.0 >> /1816/images/disk.0 >> >> >> Brick one-san-02:/bricks/VmDir02 >> Number of entries: 2 >> /1816/images/disk.0 >> /1814/images/disk.0 >> >> >> >> >> with a list of virtual machine disks healed by GlusterFS. Those and other >> files continuously appear and disappear from the list. >> >> >> This is a behavior I don't understand at all: does this mean that those >> files continuously get corrupted and healed, and self-healing is just a >> natural part of the replication process?! Or some kind of corruption is >> actually happening on our virtual disks for some reason? Is this related to >> the "remount readonly" problem? >> >> >> A more general question maybe would be: is GlusterFS v3.3 ready for storing >> running virtual machines (and is there some special configuration option >> needed on the volumes and clients for that)? >> >> Thank you in advance for shedding some light... >> >> >> Regards, >> >> -- >> : Dario Berzano >> : CERN PH-SFT & Università di Torino (Italy) >> : Wiki: http://newton.ph.unito.it/~berzano >> : GPG: http://newton.ph.unito.it/~berzano/gpg >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
