Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3

Dario Berzano Mon, 17 Sep 2012 03:18:45 -0700

Hello Pranith,

  ok, I understand that each time a write operation is performed the update 
flag is set, and reset afterwards, when update is complete.


I really don't know GlusterFS internals, but... what if a live migration or 
brick failure happens *while* these updates are ongoing?

The problem is, my VMs are definitely *not* doing fine :(

My former GlusterFS configuration had only one brick, and everything went 
perfect. Problems started to arise as soon as we migrated to a replicated 
infrastructure.

I wonder if the problem is:

 - our network;
 - some obscure configuration internal to the VMs;
 - GlusterFS;
 - a combination of all the above.

Since the only thing that changed is our GlusterFS configuration, I'm "pointing 
the finger" to it: we've put in place such replicated configuration to avoid 
having single points of failures, *but* we are experiencing more failures since 
then! To me it seems very related to this issue, if not exactly the same 
problem, although on a much smaller scale:

http://gluster.org/pipermail/gluster-users/2012-September/011444.html

Of course the problem might be network-related, I am currently running tests to 
sort it out.

Cheers
--
: Dario Berzano
: CERN PH-SFT & Università di Torino (Italy)
: Wiki: http://newton.ph.unito.it/~berzano
: GPG: http://newton.ph.unito.it/~berzano/gpg
: Mobiles: +41 766124782 (CH), +39 3487222520 (IT)



Il giorno 17/set/2012, alle ore 10:41, Pranith Kumar Karampuri 
<[email protected]>
 ha scritto:

> Dario,
>    Nothing to worry then :-). It was a transient state. Every time an update 
> is done they are marked and after the update is over they are reset. 
> Similarly the output of 'gluster volume heal <volname> info' Keeps giving 
> entries when these flags are set and not show any output when these flags are 
> reset. I thought it was a persistent one. Seems like your Vm files are doing 
> fine.
> 
> Pranith.
> ----- Original Message -----
> From: "Dario Berzano" <[email protected]>
> To: "Pranith Kumar Karampuri" <[email protected]>
> Cc: "gluster-users" <[email protected]>
> Sent: Monday, September 17, 2012 1:36:56 PM
> Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS 
> v3.3
> 
> Hi Pranith,
> 
>  those bricks stay on different servers connected on the same switch: the 
> only possibility I see is that the switch went down for some reason, it is 
> our only single point of failure. The servers themselves never went down at 
> the same time.
> 
> I do not understand however why if I run getfattr continuously:
> 
>  watch -n1 'getfattr -d -m . -e hex 1814/images/disk.0'
> 
> I get alternating:
> 
> trusted.afr.VmDir-client-0=0x000000010000000000000000
> trusted.afr.VmDir-client-1=0x000000010000000000000000
> 
> and:
> 
> trusted.afr.VmDir-client-0=0x000000000000000000000000
> trusted.afr.VmDir-client-1=0x000000000000000000000000
> 
> This again happens with every "big" file.
> 
> Does this suggest a network problem, maybe? One of the servers has 1 GbE 
> while the other one has a faster 10 GbE, but I do not think this is enough to 
> continuously de-synchronize the bricks...
> 
> Cheers
> --
> : Dario Berzano
> : CERN PH-SFT & Università di Torino (Italy)
> : Wiki: http://newton.ph.unito.it/~berzano
> : GPG: http://newton.ph.unito.it/~berzano/gpg
> : Mobiles: +41 766124782 (CH), +39 3487222520 (IT)
> 
> 
> 
> Il giorno 17/set/2012, alle ore 00:11, Pranith Kumar Karampuri 
> <[email protected]>
> ha scritto:
> 
>> 1814/images/disk.0 has pending data change log for both subvolumes. i.e. 
>> 0x00000001. This happens when both the bricks go out at the same time, while 
>> an operation is in progress. Did that happen?
>> 
>> Pranith.
>> 
>> ----- Original Message -----
>> From: "Dario Berzano" <[email protected]>
>> To: "Pranith Kumar Karampuri" <[email protected]>
>> Cc: "gluster-users" <[email protected]>
>> Sent: Sunday, September 16, 2012 9:20:23 PM
>> Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS 
>> v3.3
>> 
>> Ok, here's the output for 1816/images/disk.0:
>> 
>> # file: bricks/VmDir01/1816/images/disk.0
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>> trusted.afr.VmDir-client-0=0x000000000000000000000000
>> trusted.afr.VmDir-client-1=0x000000000000000000000000
>> trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b
>> 
>> # file: bricks/VmDir02/1816/images/disk.0
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>> trusted.afr.VmDir-client-0=0x000000000000000000000000
>> trusted.afr.VmDir-client-1=0x000000000000000000000000
>> trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b
>> 
>> And for 1814/images/disk.0:
>> 
>> # file: bricks/VmDir01/1814/images/disk.0
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>> trusted.afr.VmDir-client-0=0x000000010000000000000000
>> trusted.afr.VmDir-client-1=0x000000010000000000000000
>> trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b
>> 
>> # file: bricks/VmDir02/1814/images/disk.0
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
>> trusted.afr.VmDir-client-0=0x000000010000000000000000
>> trusted.afr.VmDir-client-1=0x000000010000000000000000
>> trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b
>> 
>> Note that these are just two sample files, since the problem occurs with 
>> 100% of our "big" virtual machines. Here's the whole content of the 
>> GlusterFS volume along with file sizes:
>> 
>> 6.3G ./1981/images/disk.0
>> 53M ./1820/images/disk.0
>> 9.7G ./1838/images/disk.0
>> 10G ./1819/images/disk.0
>> 9.2G ./1818/images/disk.0
>> 10G ./1816/images/disk.0
>> 53M ./1962/images/disk.0
>> 10G ./1814/images/disk.0
>> 6.2G ./1988/images/disk.0
>> 10G ./1817/images/disk.0
>> 53M ./1821/images/disk.0
>> 
>> We currently have 11 running VMs. The "small" ones (53 MB big) have never 
>> shown any problem so far. *All* the other VMs (6 to 10 GB big) periodically 
>> show up in the output of:
>> 
>> gluster volume heal VmDir info
>> 
>> when there's some intense I/O occuring, disappearing immediately shortly 
>> afterwards.
>> 
>> Thanks, cheers,
>> --
>> : Dario Berzano
>> : CERN PH-SFT & Università di Torino (Italy)
>> : Wiki: http://newton.ph.unito.it/~berzano
>> : GPG: http://newton.ph.unito.it/~berzano/gpg
>> : Mobiles: +41 766124782 (CH), +39 3487222520 (IT)
>> 
>> 
>> Il giorno 14/set/2012, alle ore 18:21, Pranith Kumar Karampuri 
>> <[email protected]> ha scritto:
>> 
>>> Dario,
>>> Ok that confirms that it is not a split-brain. Could you post the getfattr 
>>> output I requested as well?. What is the size of the VM files?.
>>> 
>>> Pranith
>>> ----- Original Message -----
>>> From: "Dario Berzano" <[email protected]>
>>> To: "Pranith Kumar Karampuri" <[email protected]>
>>> Cc: "<[email protected]>" <[email protected]>
>>> Sent: Friday, September 14, 2012 9:42:38 PM
>>> Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS 
>>> v3.3
>>> 
>>> 
>>> # gluster volume heal VmDir info healed 
>>> 
>>> 
>>> Heal operation on volume VmDir has been successful 
>>> 
>>> 
>>> Brick one-san-01:/bricks/VmDir01 
>>> Number of entries: 259 
>>> Segmentation fault (core dumped) 
>>> 
>>> 
>>> (same story for heal-failed) which seems to be exactly this bug: 
>>> 
>>> 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=836421 
>>> 
>>> 
>>> Should I upgrade to latest QA RPMs to see what is going on? 
>>> 
>>> 
>>> Btw, with split-brain I have no entries: 
>>> 
>>> 
>>> 
>>> Heal operation on volume VmDir has been successful 
>>> 
>>> 
>>> Brick one-san-01:/bricks/VmDir01 
>>> Number of entries: 0 
>>> 
>>> 
>>> Brick one-san-02:/bricks/VmDir02 
>>> Number of entries: 0 
>>> 
>>> 
>>> Thank you, cheers, 
>>> -- 
>>> : Dario Berzano 
>>> : CERN PH-SFT & Università di Torino (Italy) 
>>> : Wiki: http://newton.ph.unito.it/~berzano 
>>> : GPG: http://newton.ph.unito.it/~berzano/gpg 
>>> : Mobiles: +41 766124782 (CH), +39 3487222520 (IT) 
>>> 
>>> 
>>> 
>>> 
>>> Il giorno 14/set/2012, alle ore 17:16, Pranith Kumar Karampuri < 
>>> [email protected] > 
>>> ha scritto: 
>>> 
>>> 
>>> hi Dario, 
>>> Could you post the output of the following commands: 
>>> gluster volume heal VmDir info healed 
>>> gluster volume heal VmDir info split-brain 
>>> 
>>> Also provide the output of 'getfattr -d -m . -e hex' On both the bricks for 
>>> the two files listed in the output of 'gluster volume heal VmDir info' 
>>> 
>>> Pranith. 
>>> 
>>> ----- Original Message ----- 
>>> From: "Dario Berzano" < [email protected] > 
>>> To: [email protected] 
>>> Sent: Friday, September 14, 2012 6:57:32 PM 
>>> Subject: [Gluster-users] Virtual machines and self-healing on GlusterFS 
>>> v3.3 
>>> 
>>> 
>>> 
>>> Hello, 
>>> 
>>> 
>>> in our computing centre we have an infrastructure with a GlusterFS volume 
>>> made of two bricks in replicated mode: 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Volume Name: VmDir 
>>> Type: Replicate 
>>> Volume ID: 9aab85df-505c-460a-9e5b-381b1bf3c030 
>>> Status: Started 
>>> Number of Bricks: 1 x 2 = 2 
>>> Transport-type: tcp 
>>> Bricks: 
>>> Brick1: one-san-01:/bricks/VmDir01 
>>> Brick2: one-san-02:/bricks/VmDir02 
>>> 
>>> 
>>> 
>>> 
>>> We are using this volume to store running images of some KVM virtual 
>>> machines and thought we could benefit from the replicated storage in order 
>>> to achieve more robustness as well as the ability to live-migrate VMs. 
>>> 
>>> 
>>> Our GlusterFS volume VmDir is mounted on several (three at the moment) 
>>> hypervisors. 
>>> 
>>> 
>>> However, in many cases (but it is difficult to reproduce: best way is to 
>>> stress VM I/O), either when one brick becomes unavailable for some reason, 
>>> or when we perform live migrations, virtual machines decide to remount 
>>> filesystems from their virtual disks in read-only. At the same time, on the 
>>> hypervisors mounting the GlusterFS partitions, we spot some kernel messages 
>>> like: 
>>> 
>>> 
>>> 
>>> 
>>> INFO: task kvm:13560 blocked for more than 120 seconds. 
>>> 
>>> 
>>> 
>>> 
>>> By googling it I have found some "workarounds" to mitigate this problem, 
>>> like mounting disks within virtual machines with barrier=0: 
>>> 
>>> 
>>> http://invalidlogic.com/2012/04/28/ubuntu-precise-on-xenserver-disk-errors/ 
>>> 
>>> 
>>> but I actually fear to damage my virtual machine disks by doing such a 
>>> thing! 
>>> 
>>> 
>>> AFAIK from GlusterFS v3.3 self-healing should be performed server-side (and 
>>> no self-healing at all is performed on the clients and by granularly 
>>> locking big files). When I connect to my GlusterFS pool, if I monitor the 
>>> self-healing status continuously: 
>>> 
>>> 
>>> watch -n1 'gluster volume heal VmDir info' 
>>> 
>>> 
>>> I obtain an output like: 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Heal operation on volume VmDir has been successful 
>>> 
>>> 
>>> Brick one-san-01:/bricks/VmDir01 
>>> Number of entries: 2 
>>> /1814/images/disk.0 
>>> /1816/images/disk.0 
>>> 
>>> 
>>> Brick one-san-02:/bricks/VmDir02 
>>> Number of entries: 2 
>>> /1816/images/disk.0 
>>> /1814/images/disk.0 
>>> 
>>> 
>>> 
>>> 
>>> with a list of virtual machine disks healed by GlusterFS. Those and other 
>>> files continuously appear and disappear from the list. 
>>> 
>>> 
>>> This is a behavior I don't understand at all: does this mean that those 
>>> files continuously get corrupted and healed, and self-healing is just a 
>>> natural part of the replication process?! Or some kind of corruption is 
>>> actually happening on our virtual disks for some reason? Is this related to 
>>> the "remount readonly" problem? 
>>> 
>>> 
>>> A more general question maybe would be: is GlusterFS v3.3 ready for storing 
>>> running virtual machines (and is there some special configuration option 
>>> needed on the volumes and clients for that)? 
>>> 
>>> Thank you in advance for shedding some light... 
>>> 
>>> 
>>> Regards, 
>>> 
>>> -- 
>>> : Dario Berzano 
>>> : CERN PH-SFT & Università di Torino (Italy) 
>>> : Wiki: http://newton.ph.unito.it/~berzano 
>>> : GPG: http://newton.ph.unito.it/~berzano/gpg 
>>> _______________________________________________ 
>>> Gluster-users mailing list 
>>> [email protected] 
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users 
>>>

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3

Reply via email to