On 05/13/2013 09:13 PM, Darren wrote:
Hi all,
I'm pretty new to Gluster, and the company I work for uses it for
storage across 2 data centres. An issue has cropped up fairly recently
with regards to the self-heal mechanism.
Occasionally the connection between these 2 Gluster servers breaks or
drops momentarily. Due to the nature of the business it's highly likely
that files have been written during this time. When the self-heal daemon
runs it notices a discrepancy and gets the volume up to date. The
problem we've been seeing is that this appears to cause the CPU load to
increase massively on both servers whilst the healing process takes place.
After trying to find out if there were any persistent network issues I
tried recreating this on a test system and can now re-produce at will.
Our test system set up is made up of 3 VMs, 2 Gluster servers and a
client. The process to cause this was:
Add in an iptables rule to block one of the Gluster servers from being
reached by the other server and the client.
Create some random files on the client.
Can you describe the number and sizes of files created as part of this
step?
Flush the iptables rules out so the server is reachable again.
Force a self heal to run.
Watch as the load on the Gluster servers goes bananas.
How high does your load get to? Is it just the CPU or do you see other
resources like memory, network being consumed to a greater degree as well?
The problem with this is that whilst the self-heal happens one the
gluster servers will be inaccessible from the client, meaning no files
can be read or written, causing problems for our users.
I've been searching for a solution, or at least someone else who has
been having the same problem and not found anything. I don't know if
this is a bug or config issue (see below for config details). I've tried
a variety of different options but none of them have had any effect.
Our production set up is as follows:
2 Gluster servers (1 in each DC) replicating to each other
We then have multiple other servers that store and retrieve files on
Gluster using a local glusterfs mount point.
Only 1 data centre is active at any one time
The Gluster servers are VMs on a Xen hypervisor.
All our systems are CentOS 5
Gluster 3.3.1 (I've also tried 3.3.2)
gluster02 ~ gluster volume info rmfs
Volume Name: volume1
Type: Replicate
Volume ID: 3fef44e1-e840-452e-b16b-a9fc698e7dfd
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster01:/mnt/store1
Brick2: gluster02:/mnt/store1
Options Reconfigured:
nfs.disable: off
auth.allow: 172.30.98.*
network.ping-timeout: 5
Setting network.ping-timeout to 5 is generally not recommended. As a
matter of fact, it would not be advisable to alter the ping timeout from
the default value.
Regards,
Vijay
Any help or suggestions would be greatly appreciated. If you need
anything else from me, just ask.
Thanks,
Darren
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users