Joe,

First, allow me to apologize for top-posting -- webmail client doesn't leave me 
much choice, unfortunately.


Second, I'd like to thank you profusely for replying so quickly to my question. 
I am accustomed to long wait times on most OSS mailing lists. :)


So... I wish that I had known that the procedure recommended at 
http://blog.gluster.org/2014/08/debunking-the-glusterfs-rdma-is-unstable-myth/ 
(which is what I followed to get started on this little adventure) would leave 
me with potentially un-production-stable packages.... :/


I went ahead and yanked out all of the 3.6.2 stuff and reinstalled with 3.5.3, 
and it's like night and day. I can stop the glusterd on duchess, write out a 
2GB file on the volume mountpoint on duke, and when I start glusterd back up on 
duchess, I can't even type "ls" fast enough before the new file is on the brick 
locally.


Also, running "gluster volume heal $vol info" no longer results in the segfault 
and always gives me useful output (even if it's just to say that everything is 
fine...).


For now, I think this has the potential of curing all of my issues here.   I 
will keep testing, and I'll post back here if I need any further assistance.


Oh, by the way, I still get inaccurate node names from "gluster volume heal 
$vol info" with 3.5.3:

[root@duchess ~]# gluster volume heal gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
Number of entries: 0

Brick duchess.jonheese.local:/bricks/brick1/
Number of entries: 0


(Notice that the nodes are named "duke-ib" and "duchess-ib" in the 'volume 
info' output:
[root@duchess ~]# gluster volume info

Volume Name: gluster_disk
Type: Replicate
Volume ID: 7158b824-455f-46f0-9da3-9b4d6c1fc484
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1


Should I raise a bug for this?


Thanks again!


Regards,

Jon Heese


________________________________
From: [email protected] <[email protected]> on 
behalf of Joe Julian <[email protected]>
Sent: Sunday, March 15, 2015 3:39 PM
To: [email protected]
Subject: Re: [Gluster-users] Self-heal doesn't appear to be happening

On 03/15/2015 11:16 AM, Jonathan Heese wrote:

Hello all,


I have a 2 node 2 brick replicate gluster volume that I'm having trouble making 
fault tolerant (a seemingly basic feature!) under CentOS 6.6 using EPEL 
packages.


Both nodes are as close to identical hardware and software as possible, and I'm 
running the following packages:

glusterfs-rdma-3.6.2-1.el6.x86_64
glusterfs-fuse-3.6.2-1.el6.x86_64
glusterfs-libs-3.6.2-1.el6.x86_64
glusterfs-cli-3.6.2-1.el6.x86_64
glusterfs-api-3.6.2-1.el6.x86_64
glusterfs-server-3.6.2-1.el6.x86_64
glusterfs-3.6.2-1.el6.x86_64

3.6.2 is not considered production stable. Based on your expressed concern, you 
should probably be running 3.5.3.


They both have dual-port Mellanox 20Gbps InfiniBand cards with a straight (i.e. 
"crossover") cable and opensm to facilitate the RDMA transport between them.


Here are some data dumps to set the stage (and yes, the output of these 
commands looks the same on both nodes):


[root@duchess ~]# gluster volume info

Volume Name: gluster_disk
Type: Replicate
Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1


[root@duchess ~]# gluster volume status
Status of volume: gluster_disk
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick duke-ib:/bricks/brick1                            49153   Y       9594
Brick duchess-ib:/bricks/brick1                         49153   Y       9583
NFS Server on localhost                                 2049    Y       9590
Self-heal Daemon on localhost                           N/A     Y       9597
NFS Server on 10.10.10.1                                2049    Y       9607
Self-heal Daemon on 10.10.10.1                          N/A     Y       9614

Task Status of Volume gluster_disk
------------------------------------------------------------------------------
There are no active volume tasks


[root@duchess ~]# gluster peer status
Number of Peers: 1

Hostname: 10.10.10.1
Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
State: Peer in Cluster (Connected)


So before putting any real data on these guys (the data will eventually be a 
handful of large image files backing an iSCSI target via tgtd for ESXi 
datastores), I wanted to simulate the failure of one of the nodes. So I stopped 
glusterfsd and glusterd on duchess, waited about 5 minutes, then started them 
back up again, tail'ing /var/log/glusterfs/* and /var/log/messages. I'm not 
sure exactly what I'm looking for, but the logs quieted down after just a 
minute or so of restarting the daemons. I didn't see much indicating that 
self-healing was going on.


Every now and then (and seemingly more often than not), when I run "gluster 
volume heal gluster_disk info", I get no output from the command, and the 
following dumps into my /var/log/messages:


Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at 7ff56068d020 ip 
00007ff54f366d80 sp 00007ff54e22adf8 error 6 in 
libmthca-rdmav2.so[7ff54f365000+7000]

This a segfault in the mellanox driver. Please report it to the driver 
developers.

Mar 15 13:59:17 duchess abrtd: Directory 'ccpp-2015-03-15-13:59:16-10359' 
creation detected
Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359 
(/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359 
(225595392 bytes)
Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't signed with 
proper key
Mar 15 13:59:25 duchess abrtd: 'post-create' on 
'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1
Mar 15 13:59:25 duchess abrtd: Deleting problem directory 
'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'


Other times, when I'm lucky, I get messages from the "heal info" command 
indicating that datastore1.img (the file that I intentionally changed while 
duchess was offline) is in need of healing:


[root@duke ~]# gluster volume heal gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1

Brick duchess.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1


But watching df on the bricks and tailing glustershd.log doesn't seem to 
indicate that anything is actually happening -- and df indicates that brick on 
duke *is* different in file size from the brick on duchess. It's been over an 
hour now, and I'm not confident that the selfheal functionality is even working 
at all... Nor do I know how to do anything about it!

File sizes are not necessarily any indication. If the changes you made were 
nulls, the change may be sparse. df --apparent is a little better indicator. 
Comparing hashes would be even better.

The extended attributes on the file itself, on the bricks, can tell you the 
heal state. Look at "getfattr -m . -d -e hex $file". The trusted.afr 
attributes, if non-zero, show pending changes destined for the other server.


Also, I find it a little bit troubling that I'm using the aliases (in 
/etc/hosts on both servers) duke-ib and duchess-ib for the gluster node 
configuration, but the "heal info" command refers to my nodes with their 
internal FQDNs, which resolve to their 1Gbps interface IPs... That doesn't mean 
that they're trying to communicate over those interfaces (the volume is 
configured with "transport rdma", as you can see above), does it?

I'd call that a bug. It should report the hostnames as they're listed in the 
volume info.


Can anyone throw out any ideas on how I can:

1. Determine whether this is intentional behavior (or a bug?),

2. Determine whether my data has been properly resync'd across the bricks, and

3. Make it work correctly if not.


Thanks in advance!


Regards,

Jon Heese



_______________________________________________
Gluster-users mailing list
[email protected]<mailto:[email protected]>
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to