Re: [Gluster-users] Self-heal doesn't appear to be happening

Joe Julian Sun, 15 Mar 2015 12:39:33 -0700

On 03/15/2015 11:16 AM, Jonathan Heese wrote:

Hello all,
I have a 2 node 2 brick replicate gluster volume that I'm havingtrouble making fault tolerant (a seemingly basic feature!) underCentOS 6.6 using EPEL packages.
Both nodes are as close to identical hardware and software aspossible, and I'm running the following packages:
glusterfs-rdma-3.6.2-1.el6.x86_64
glusterfs-fuse-3.6.2-1.el6.x86_64
glusterfs-libs-3.6.2-1.el6.x86_64
glusterfs-cli-3.6.2-1.el6.x86_64
glusterfs-api-3.6.2-1.el6.x86_64
glusterfs-server-3.6.2-1.el6.x86_64
glusterfs-3.6.2-1.el6.x86_64

3.6.2 is not considered production stable. Based on your expressedconcern, you should probably be running 3.5.3.

They both have dual-port Mellanox 20Gbps InfiniBand cards with astraight (i.e. "crossover") cable and opensm to facilitate the RDMAtransport between them.
Here are some data dumps to set the stage (and yes, the output ofthese commands looks the same on both nodes):
[root@duchess ~]# gluster volume info

Volume Name: gluster_disk
Type: Replicate
Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1


[root@duchess ~]# gluster volume status
Status of volume: gluster_disk
Gluster process Port    Online  Pid
------------------------------------------------------------------------------
Brick duke-ib:/bricks/brick1 49153   Y       9594
Brick duchess-ib:/bricks/brick1 49153   Y       9583
NFS Server on localhost 2049    Y       9590
Self-heal Daemon on localhost N/A     Y       9597
NFS Server on 10.10.10.1 2049    Y       9607
Self-heal Daemon on 10.10.10.1 N/A     Y       9614

Task Status of Volume gluster_disk
------------------------------------------------------------------------------
There are no active volume tasks


[root@duchess ~]# gluster peer status
Number of Peers: 1

Hostname: 10.10.10.1
Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
State: Peer in Cluster (Connected)
So before putting any real data on these guys (the data willeventually be a handful of large image files backing an iSCSI targetvia tgtd for ESXi datastores), I wanted to simulate the failure of oneof the nodes. So I stopped glusterfsd and glusterd on duchess, waitedabout 5 minutes, then started them back up again, tail'ing/var/log/glusterfs/* and /var/log/messages. I'm not sure exactly whatI'm looking for, but the logs quieted down after just a minute or soof restarting the daemons. I didn't see much indicating thatself-healing was going on.
Every now and then (and seemingly more often than not), when I run"gluster volume heal gluster_disk info", I get no output from thecommand, and the following dumps into my /var/log/messages:
Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at7ff56068d020 ip 00007ff54f366d80 sp 00007ff54e22adf8 error 6 inlibmthca-rdmav2.so[7ff54f365000+7000]

This a segfault in the mellanox driver. Please report it to the driverdevelopers.

Mar 15 13:59:17 duchess abrtd: Directory'ccpp-2015-03-15-13:59:16-10359' creation detectedMar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359(/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359(225595392 bytes)Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't signedwith proper keyMar 15 13:59:25 duchess abrtd: 'post-create' on'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1Mar 15 13:59:25 duchess abrtd: Deleting problem directory'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'
Other times, when I'm lucky, I get messages from the "heal info"command indicating that datastore1.img (the file that I intentionallychanged while duchess was offline) is in need of healing:
[root@duke ~]# gluster volume heal gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1

Brick duchess.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1
But watching df on the bricks and tailing glustershd.log doesn't seemto indicate that anything is actually happening -- and df indicatesthat brick on duke *is* different in file size from the brick onduchess. It's been over an hour now, and I'm not confident that theselfheal functionality is even working at all... Nor do I know how todo anything about it!

File sizes are not necessarily any indication. If the changes you madewere nulls, the change may be sparse. df --apparent is a little betterindicator. Comparing hashes would be even better.

The extended attributes on the file itself, on the bricks, can tell youthe heal state. Look at "getfattr -m . -d -e hex $file". The trusted.afrattributes, if non-zero, show pending changes destined for the other server.

Also, I find it a little bit troubling that I'm using the aliases (in/etc/hosts on both servers) duke-ib and duchess-ib for the glusternode configuration, but the "heal info" command refers to my nodeswith their internal FQDNs, which resolve to their 1Gbps interfaceIPs... That doesn't mean that they're trying to communicate over thoseinterfaces (the volume is configured with "transport rdma", as you cansee above), does it?

I'd call that a bug. It should report the hostnames as they're listed inthe volume info.



Can anyone throw out any ideas on how I can:

1. Determine whether this is intentional behavior (or a bug?),

2. Determine whether my data has been properly resync'd across thebricks, and


3. Make it work correctly if not.


Thanks in advance!


Regards,

Jon Heese



_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-heal doesn't appear to be happening

Reply via email to