I have some connectivity errors with GlusterFS mount points I can't get solved. 
We have a pretty basis setup with two Gluster bricks and a bunch of clients 
(all 3.3.2). Very occasionally we have a brief network outages and some Gluster 
mounts points get unavailable. The other Gluster mounts on other servers to the 
same bricks have no problems. 

The console on client shows:
mountall: Plymouth command failed
mountall: Disconnected from Plymouth
mountall: Event failed
mountall: Skipping mounting /home since Plymouth is not available

Manual mount gives:
$ sudo mount /home
unknown option _netdev (ignored)
ERROR: Mount point does not exist.
Usage:  mount.glusterfs <volumeserver>:<volumeid/volumeport> -o <options> 
<mount point>

On the client, I can see a few hung connections (lsof | grep TCP shows stuck on 
SYN_SENT, source port 24010 on client). Also the connection tracker of iptables 
seem to have issues:
Nov 22 09:28:36 app16 kernel: [3180197.360596] [INPUT] dropped IN=eth0 OUT= 
MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 
WINDOW=14480 RES=0x00 ACK SYN URGP=0 
Nov 22 09:28:37 app16 kernel: [3180198.156075] [INPUT] dropped IN=eth0 OUT= 
MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 
WINDOW=14480 RES=0x00 ACK SYN URGP=0 
Nov 22 09:28:44 app16 kernel: [3180205.377404] [INPUT] dropped IN=eth0 OUT= 
MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 
WINDOW=14480 RES=0x00 ACK SYN URGP=0 
Nov 22 09:28:45 app16 kernel: [3180206.160003] [INPUT] dropped IN=eth0 OUT= 
MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 
WINDOW=14480 RES=0x00 ACK SYN URGP=0 
Nov 22 09:29:00 app16 kernel: [3180221.410958] [INPUT] dropped IN=eth0 OUT= 
MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 
WINDOW=14480 RES=0x00 ACK SYN URGP=0 
Nov 22 09:29:00 app16 kernel: [3180222.154831] [INPUT] dropped IN=eth0 OUT= 
MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 
WINDOW=14480 RES=0x00 ACK SYN URGP=0 

Work around is to manuallly umount and mount the failed shares. No more 
SYN_SENT connections in lsof and the share is accessible again. But what is the 
cause of this? We need the shares to be available any time, especially after 
network recovers. That's the whole point of distributed file systems...

Some background info. /etc/fstab contains:
file1.cluster.peercode.nl:GLUSTER-HOME  /home   glusterfs       
nobootwait,backupvolfile-server=file2.cluster.peercode.nl       0       0

This is the log of brick 10.243.0.76 during a short network hickup:
[2013-11-21 21:57:07.877100] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 22:07:07.984100] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 22:17:08.093102] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 22:25:53.475072] W [socket.c:195:__socket_rwv] 
0-GLUSTER-HOME-client-1: readv failed (Connection reset by peer)
[2013-11-21 22:25:53.475149] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-HOME-client-1: reading from socket failed. Error (Connection reset by 
peer), peer (10.243.0.24:24009)
[2013-11-21 22:25:53.492487] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-HOME-client-1: disconnected
[2013-11-21 22:25:54.536414] W [socket.c:195:__socket_rwv] 
0-GLUSTER-SHARE-client-1: readv failed (Connection reset by peer)
[2013-11-21 22:25:54.536454] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-SHARE-client-1: reading from socket failed. Error (Connection reset 
by peer), peer (10.243.0.24:24010)
[2013-11-21 22:25:54.536503] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-SHARE-client-1: disconnected
[2013-11-21 22:26:03.539704] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-HOME-client-1: Using Program GlusterFS 3.3.1, Num (1298437), Version 
(330)
[2013-11-21 22:26:03.541640] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-HOME-client-1: Connected to 10.243.0.24:24009, attached to remote 
volume '/data/export-home-2'.
[2013-11-21 22:26:03.541668] I [client-handshake.c:1423:client_setvolume_cbk] 
0-GLUSTER-HOME-client-1: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-11-21 22:26:03.548534] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-HOME-client-1: 
Server lk version = 1
[2013-11-21 22:26:05.536563] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-SHARE-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version 
(330)
[2013-11-21 22:26:05.537510] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-1: Connected to 10.243.0.24:24010, attached to remote 
volume '/data/export-share-2'.
[2013-11-21 22:26:05.537530] I [client-handshake.c:1423:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-1: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-11-21 22:26:05.541133] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-SHARE-client-1: 
Server lk version = 1
[2013-11-21 22:27:08.549143] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 22:37:08.655387] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 22:47:05.551891] W [socket.c:195:__socket_rwv] 
0-GLUSTER-SHARE-client-1: readv failed (Connection timed out)
[2013-11-21 22:47:05.551961] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-SHARE-client-1: reading from socket failed. Error (Connection timed 
out), peer (10.243.0.24:24010)
[2013-11-21 22:47:05.552011] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-SHARE-client-1: disconnected
[2013-11-21 22:47:07.599889] W [socket.c:195:__socket_rwv] 
0-GLUSTER-HOME-client-1: readv failed (Connection timed out)
[2013-11-21 22:47:07.599956] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-HOME-client-1: reading from socket failed. Error (Connection timed 
out), peer (10.243.0.24:24009)
[2013-11-21 22:47:07.600008] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-HOME-client-1: disconnected
[2013-11-21 22:47:08.761366] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-SHARE-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:47:08.764653] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-HOME-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:47:18.759922] E [socket.c:1715:socket_connect_finish] 
0-GLUSTER-HOME-client-1: connection to 10.243.0.24:24009 failed (No route to 
host)
[2013-11-21 22:48:18.907865] E [socket.c:1715:socket_connect_finish] 
0-GLUSTER-SHARE-client-1: connection to 10.243.0.24:24010 failed (Connection 
timed out)
[2013-11-21 22:49:50.825110] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-HOME-client-1: Using Program GlusterFS 3.3.1, Num (1298437), Version 
(330)
[2013-11-21 22:49:50.825887] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-HOME-client-1: Connected to 10.243.0.24:24009, attached to remote 
volume '/data/export-home-2'.
[2013-11-21 22:49:50.825906] I [client-handshake.c:1423:client_setvolume_cbk] 
0-GLUSTER-HOME-client-1: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-11-21 22:49:50.826525] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-HOME-client-1: 
Server lk version = 1
[2013-11-21 22:49:52.863320] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-SHARE-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version 
(330)
[2013-11-21 22:49:52.864061] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-1: Connected to 10.243.0.24:24010, attached to remote 
volume '/data/export-share-2'.
[2013-11-21 22:49:52.864089] I [client-handshake.c:1423:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-1: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-11-21 22:49:52.864841] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-SHARE-client-1: 
Server lk version = 1
[2013-11-21 22:57:08.913844] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 23:07:09.033899] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory
[2013-11-21 23:17:09.160547] W [client3_1-fops.c:647:client3_1_unlink_cbk] 
0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory

From this moment on, 10.243.0.24 lost the /home share.

And the other brick:
[2013-11-21 22:23:01.157049] W [socket.c:195:__socket_rwv] 
0-GLUSTER-SHARE-client-0: readv failed (Connection timed out)
[2013-11-21 22:23:01.157145] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-SHARE-client-0: reading from socket failed. Error (Connection timed 
out), peer (10.243.0.23:24010)
[2013-11-21 22:23:01.157198] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-SHARE-client-0: disconnected
[2013-11-21 22:23:05.829005] W [socket.c:195:__socket_rwv] 
0-GLUSTER-HOME-client-0: readv failed (Connection timed out)
[2013-11-21 22:23:05.829068] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-HOME-client-0: reading from socket failed. Error (Connection timed 
out), peer (10.243.0.23:24009)
[2013-11-21 22:23:05.829128] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-HOME-client-0: disconnected
[2013-11-21 22:23:13.437077] E [socket.c:1715:socket_connect_finish] 
0-GLUSTER-SHARE-client-0: connection to 10.243.0.23:24010 failed (No route to 
host)
[2013-11-21 22:23:16.437010] E [socket.c:1715:socket_connect_finish] 
0-GLUSTER-HOME-client-0: connection to 10.243.0.23:24009 failed (No route to 
host)
[2013-11-21 22:25:53.476614] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-SHARE-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version 
(330)
[2013-11-21 22:25:53.477421] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-0: Connected to 10.243.0.23:24010, attached to remote 
volume '/data/export-share-1'.
[2013-11-21 22:25:53.477448] I [client-handshake.c:1432:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-0: Server and Client lk-version numbers are same, no 
need to reopen the fds
[2013-11-21 22:25:56.482419] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-HOME-client-0: Using Program GlusterFS 3.3.1, Num (1298437), Version 
(330)
[2013-11-21 22:25:56.484738] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-HOME-client-0: Connected to 10.243.0.23:24009, attached to remote 
volume '/data/export-home-1'.
[2013-11-21 22:25:56.484769] I [client-handshake.c:1432:client_setvolume_cbk] 
0-GLUSTER-HOME-client-0: Server and Client lk-version numbers are same, no need 
to reopen the fds
[2013-11-21 22:26:52.486420] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-HOME-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:30:02.519308] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-SHARE-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:36:52.593154] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-HOME-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:40:02.627325] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-SHARE-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:49:48.238564] E [afr-self-heald.c:418:_crawl_proceed] 
0-GLUSTER-HOME-replicate-0: Stopping crawl as < 2 children are up
[2013-11-21 22:49:50.245822] W [socket.c:195:__socket_rwv] 
0-GLUSTER-HOME-client-0: readv failed (Connection reset by peer)
[2013-11-21 22:49:50.245873] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-HOME-client-0: reading from socket failed. Error (Connection reset by 
peer), peer (10.243.0.23:24009)
[2013-11-21 22:49:50.245913] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-HOME-client-0: disconnected
[2013-11-21 22:49:50.245931] W [socket.c:195:__socket_rwv] 
0-GLUSTER-SHARE-client-0: readv failed (Connection reset by peer)
[2013-11-21 22:49:50.245943] W [socket.c:1512:__socket_proto_state_machine] 
0-GLUSTER-SHARE-client-0: reading from socket failed. Error (Connection reset 
by peer), peer (10.243.0.23:24010)
[2013-11-21 22:49:50.245965] I [client.c:2090:client_rpc_notify] 
0-GLUSTER-SHARE-client-0: disconnected
[2013-11-21 22:50:01.243099] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-HOME-client-0: Using Program GlusterFS 3.3.1, Num (1298437), Version 
(330)
[2013-11-21 22:50:01.243299] I 
[client-handshake.c:1614:select_server_supported_programs] 
0-GLUSTER-SHARE-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version 
(330)
[2013-11-21 22:50:01.244103] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-HOME-client-0: Connected to 10.243.0.23:24009, attached to remote 
volume '/data/export-home-1'.
[2013-11-21 22:50:01.244154] I [client-handshake.c:1423:client_setvolume_cbk] 
0-GLUSTER-HOME-client-0: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-11-21 22:50:01.244918] I [client-handshake.c:1411:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-0: Connected to 10.243.0.23:24010, attached to remote 
volume '/data/export-share-1'.
[2013-11-21 22:50:01.244945] I [client-handshake.c:1423:client_setvolume_cbk] 
0-GLUSTER-SHARE-client-0: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-11-21 22:50:01.246500] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-SHARE-client-0: 
Server lk version = 1
[2013-11-21 22:50:01.246551] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-HOME-client-0: 
Server lk version = 1

Home volume configuration:
Volume Name: GLUSTER-HOME
Type: Replicate
Volume ID: 27ac2466-584e-491a-9717-2ed4869b1c28
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: file1.cluster.peercode.nl:/data/export-home-1
Brick2: file2.cluster.peercode.nl:/data/export-home-2
Options Reconfigured:
auth.allow: 10.243.0.*
features.quota: on
features.limit-usage: /:30gb

Installed packages:
$ dpkg -l gluster*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                   Version                Description
+++-======================-======================-============================================================
un  glusterfs              <none>                 (no description available)
ii  glusterfs-client       3.3.2-ubuntu1~precise2 clustered file-system (client 
package)
ii  glusterfs-common       3.3.2-ubuntu1~precise2 GlusterFS common libraries 
and translator modules
ii  glusterfs-server       3.3.2-ubuntu1~precise2 clustered file-system (server 
package)

Ubuntu 12.04

Best,

Mark Ruys



---
dr M.P.J. Ruys (PhD)   ::                Peercode
Oudenhof 4c, 4191NW Geldermalsen, The Netherlands
Web site and travel directions:   www.peercode.nl
Phone +31.88.0084124   ::   Mobile +31.6.51298623

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to