Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3

Jules Wang Mon, 17 Sep 2012 19:34:38 -0700

Dario:
      We have exactly the same problem as you have,  I doubt it is the side 
effect of self-healing,  so I stop the self-healing daemon by "gluster set 
<volume name> cluster.self-heal-daemon off". Then it works well. By the way 
obviously it will not works well if I restart one of the nodes.
      So you may have a try, good luck.



Best  Regards.


Jules Wang.




At 2012-09-17 18:18:33,"Dario Berzano" <[email protected]> wrote:
Hello Pranith,


  ok, I understand that each time a write operation is performed the update 
flag is set, and reset afterwards, when update is complete.


I really don't know GlusterFS internals, but... what if a live migration or 
brick failure happens *while* these updates are ongoing?


The problem is, my VMs are definitely *not* doing fine :(


My former GlusterFS configuration had only one brick, and everything went 
perfect. Problems started to arise as soon as we migrated to a replicated 
infrastructure.


I wonder if the problem is:


 - our network;
 - some obscure configuration internal to the VMs;
 - GlusterFS;
 - a combination of all the above.


Since the only thing that changed is our GlusterFS configuration, I'm "pointing 
the finger" to it: we've put in place such replicated configuration to avoid 
having single points of failures, *but* we are experiencing more failures since 
then! To me it seems very related to this issue, if not exactly the same 
problem, although on a much smaller scale:


http://gluster.org/pipermail/gluster-users/2012-September/011444.html


Of course the problem might be network-related, I am currently running tests to 
sort it out.


Cheers
--
: Dario Berzano
: CERN PH-SFT & Università di Torino (Italy)
: Wiki: http://newton.ph.unito.it/~berzano
: GPG: http://newton.ph.unito.it/~berzano/gpg
: Mobiles: +41 766124782 (CH), +39 3487222520 (IT)






Il giorno 17/set/2012, alle ore 10:41, Pranith Kumar Karampuri 
<[email protected]>
 ha scritto:

Dario,
   Nothing to worry then :-). It was a transient state. Every time an update is 
done they are marked and after the update is over they are reset. Similarly the 
output of 'gluster volume heal <volname> info' Keeps giving entries when these 
flags are set and not show any output when these flags are reset. I thought it 
was a persistent one. Seems like your Vm files are doing fine.

Pranith.
----- Original Message -----
From: "Dario Berzano" <[email protected]>
To: "Pranith Kumar Karampuri" <[email protected]>
Cc: "gluster-users" <[email protected]>
Sent: Monday, September 17, 2012 1:36:56 PM
Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3

Hi Pranith,

 those bricks stay on different servers connected on the same switch: the only 
possibility I see is that the switch went down for some reason, it is our only 
single point of failure. The servers themselves never went down at the same 
time.

I do not understand however why if I run getfattr continuously:

 watch -n1 'getfattr -d -m . -e hex 1814/images/disk.0'

I get alternating:

trusted.afr.VmDir-client-0=0x000000010000000000000000
trusted.afr.VmDir-client-1=0x000000010000000000000000

and:

trusted.afr.VmDir-client-0=0x000000000000000000000000
trusted.afr.VmDir-client-1=0x000000000000000000000000

This again happens with every "big" file.

Does this suggest a network problem, maybe? One of the servers has 1 GbE while 
the other one has a faster 10 GbE, but I do not think this is enough to 
continuously de-synchronize the bricks...

Cheers
--
: Dario Berzano
: CERN PH-SFT & Università di Torino (Italy)
: Wiki: http://newton.ph.unito.it/~berzano
: GPG: http://newton.ph.unito.it/~berzano/gpg
: Mobiles: +41 766124782 (CH), +39 3487222520 (IT)



Il giorno 17/set/2012, alle ore 00:11, Pranith Kumar Karampuri 
<[email protected]>
ha scritto:

1814/images/disk.0 has pending data change log for both subvolumes. i.e. 
0x00000001. This happens when both the bricks go out at the same time, while an 
operation is in progress. Did that happen?

Pranith.

----- Original Message -----
From: "Dario Berzano" <[email protected]>
To: "Pranith Kumar Karampuri" <[email protected]>
Cc: "gluster-users" <[email protected]>
Sent: Sunday, September 16, 2012 9:20:23 PM
Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3

Ok, here's the output for 1816/images/disk.0:

# file: bricks/VmDir01/1816/images/disk.0
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.VmDir-client-0=0x000000000000000000000000
trusted.afr.VmDir-client-1=0x000000000000000000000000
trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b

# file: bricks/VmDir02/1816/images/disk.0
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.VmDir-client-0=0x000000000000000000000000
trusted.afr.VmDir-client-1=0x000000000000000000000000
trusted.gfid=0x1cef9d386f1c4424af6d95dfbcf2989b

And for 1814/images/disk.0:

# file: bricks/VmDir01/1814/images/disk.0
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.VmDir-client-0=0x000000010000000000000000
trusted.afr.VmDir-client-1=0x000000010000000000000000
trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b

# file: bricks/VmDir02/1814/images/disk.0
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.VmDir-client-0=0x000000010000000000000000
trusted.afr.VmDir-client-1=0x000000010000000000000000
trusted.gfid=0xaabc0c344ccc4cfe8e2ed588dd78323b

Note that these are just two sample files, since the problem occurs with 100% 
of our "big" virtual machines. Here's the whole content of the GlusterFS volume 
along with file sizes:

6.3G ./1981/images/disk.0
53M ./1820/images/disk.0
9.7G ./1838/images/disk.0
10G ./1819/images/disk.0
9.2G ./1818/images/disk.0
10G ./1816/images/disk.0
53M ./1962/images/disk.0
10G ./1814/images/disk.0
6.2G ./1988/images/disk.0
10G ./1817/images/disk.0
53M ./1821/images/disk.0

We currently have 11 running VMs. The "small" ones (53 MB big) have never shown 
any problem so far. *All* the other VMs (6 to 10 GB big) periodically show up 
in the output of:

gluster volume heal VmDir info

when there's some intense I/O occuring, disappearing immediately shortly 
afterwards.

Thanks, cheers,
--
: Dario Berzano
: CERN PH-SFT & Università di Torino (Italy)
: Wiki: http://newton.ph.unito.it/~berzano
: GPG: http://newton.ph.unito.it/~berzano/gpg
: Mobiles: +41 766124782 (CH), +39 3487222520 (IT)


Il giorno 14/set/2012, alle ore 18:21, Pranith Kumar Karampuri 
<[email protected]> ha scritto:

Dario,
Ok that confirms that it is not a split-brain. Could you post the getfattr 
output I requested as well?. What is the size of the VM files?.

Pranith
----- Original Message -----
From: "Dario Berzano" <[email protected]>
To: "Pranith Kumar Karampuri" <[email protected]>
Cc: "<[email protected]>" <[email protected]>
Sent: Friday, September 14, 2012 9:42:38 PM
Subject: Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3


# gluster volume heal VmDir info healed


Heal operation on volume VmDir has been successful


Brick one-san-01:/bricks/VmDir01
Number of entries: 259
Segmentation fault (core dumped)


(same story for heal-failed) which seems to be exactly this bug:


https://bugzilla.redhat.com/show_bug.cgi?id=836421


Should I upgrade to latest QA RPMs to see what is going on?


Btw, with split-brain I have no entries:



Heal operation on volume VmDir has been successful


Brick one-san-01:/bricks/VmDir01
Number of entries: 0


Brick one-san-02:/bricks/VmDir02
Number of entries: 0


Thank you, cheers,
--
: Dario Berzano
: CERN PH-SFT & Università di Torino (Italy)
: Wiki: http://newton.ph.unito.it/~berzano
: GPG: http://newton.ph.unito.it/~berzano/gpg
: Mobiles: +41 766124782 (CH), +39 3487222520 (IT)




Il giorno 14/set/2012, alle ore 17:16, Pranith Kumar Karampuri < 
[email protected] >
ha scritto:


hi Dario,
Could you post the output of the following commands:
gluster volume heal VmDir info healed
gluster volume heal VmDir info split-brain

Also provide the output of 'getfattr -d -m . -e hex' On both the bricks for the 
two files listed in the output of 'gluster volume heal VmDir info'

Pranith.

----- Original Message -----
From: "Dario Berzano" < [email protected] >
To: [email protected]
Sent: Friday, September 14, 2012 6:57:32 PM
Subject: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3



Hello,


in our computing centre we have an infrastructure with a GlusterFS volume made 
of two bricks in replicated mode:





Volume Name: VmDir
Type: Replicate
Volume ID: 9aab85df-505c-460a-9e5b-381b1bf3c030
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: one-san-01:/bricks/VmDir01
Brick2: one-san-02:/bricks/VmDir02




We are using this volume to store running images of some KVM virtual machines 
and thought we could benefit from the replicated storage in order to achieve 
more robustness as well as the ability to live-migrate VMs.


Our GlusterFS volume VmDir is mounted on several (three at the moment) 
hypervisors.


However, in many cases (but it is difficult to reproduce: best way is to stress 
VM I/O), either when one brick becomes unavailable for some reason, or when we 
perform live migrations, virtual machines decide to remount filesystems from 
their virtual disks in read-only. At the same time, on the hypervisors mounting 
the GlusterFS partitions, we spot some kernel messages like:




INFO: task kvm:13560 blocked for more than 120 seconds.




By googling it I have found some "workarounds" to mitigate this problem, like 
mounting disks within virtual machines with barrier=0:


http://invalidlogic.com/2012/04/28/ubuntu-precise-on-xenserver-disk-errors/


but I actually fear to damage my virtual machine disks by doing such a thing!


AFAIK from GlusterFS v3.3 self-healing should be performed server-side (and no 
self-healing at all is performed on the clients and by granularly locking big 
files). When I connect to my GlusterFS pool, if I monitor the self-healing 
status continuously:


watch -n1 'gluster volume heal VmDir info'


I obtain an output like:





Heal operation on volume VmDir has been successful


Brick one-san-01:/bricks/VmDir01
Number of entries: 2
/1814/images/disk.0
/1816/images/disk.0


Brick one-san-02:/bricks/VmDir02
Number of entries: 2
/1816/images/disk.0
/1814/images/disk.0




with a list of virtual machine disks healed by GlusterFS. Those and other files 
continuously appear and disappear from the list.


This is a behavior I don't understand at all: does this mean that those files 
continuously get corrupted and healed, and self-healing is just a natural part 
of the replication process?! Or some kind of corruption is actually happening 
on our virtual disks for some reason? Is this related to the "remount readonly" 
problem?


A more general question maybe would be: is GlusterFS v3.3 ready for storing 
running virtual machines (and is there some special configuration option needed 
on the volumes and clients for that)?

Thank you in advance for shedding some light...


Regards,

--
: Dario Berzano
: CERN PH-SFT & Università di Torino (Italy)
: Wiki: http://newton.ph.unito.it/~berzano
: GPG: http://newton.ph.unito.it/~berzano/gpg
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Virtual machines and self-healing on GlusterFS v3.3

Reply via email to