I just stood up a new cluster running 4.1.7, my first experience with version 
4. It is a simple replica 3 volume:

gluster v create la1_db_1 replica 3 \
        gluster-10g-1:/gluster-bricks/la1_db_1/la1_db_1 \
        gluster-10g-2:/gluster-bricks/la1_db_1/la1_db_1 \
        gluster-10g-3:/gluster-bricks/la1_db_1/la1_db_1

gluster v set la1_db_1 storage.owner-uid 130
gluster v set la1_db_1 storage.owner-gid 130
gluster v set la1_db_1 server.allow-insecure on
gluster v set la1_db_1 auth.allow [various IPs]

After mounting on a client, everything appears fine until you try to use it.

dd if=/dev/zero of=/path/on/client/foo

will write 512 bytes and then hang until timeout, at which point it declares 
"Transport not connected".

Notably, if I mount the volume on one of the gluster machines over the same 
interface, it behave like it should. That led me to investigate packet 
filtering, which is configured correctly, and in any case, after flushing all 
rules on all involved machines, the issue persists.

cli.log contains a lot of:

[2019-03-06 17:00:02.893553] I [cli.c:773:main] 0-cli: Started running 
/sbin/gluster with version 4.1.7
[2019-03-06 17:00:02.897199] I [cli-cmd-volume.c:2375:cli_check_gsync_present] 
0-: geo-replication not installed
[2019-03-06 17:00:02.897545] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1
[2019-03-06 17:00:02.897617] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now
[2019-03-06 17:00:02.897678] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: 
error returned while attempting to connect to host:(null), port:0
[2019-03-06 17:00:02.898244] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-03-06 17:00:02.922637] I [cli.c:773:main] 0-cli: Started running 
/sbin/gluster with version 4.1.7
[2019-03-06 17:00:02.926599] I [cli-cmd-volume.c:2375:cli_check_gsync_present] 
0-: geo-replication not installed
[2019-03-06 17:00:02.926906] I [MSGID: 101190] 
[event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1
[2019-03-06 17:00:02.926956] I [socket.c:2632:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now
[2019-03-06 17:00:02.927113] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: 
error returned while attempting to connect to host:(null), port:0
[2019-03-06 17:00:02.927573] I [input.c:31:cli_batch] 0-: Exiting with: 0

The client log is more interesting, I just don't know what to make of it:

[2019-03-07 19:18:36.674687] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.674726] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.674752] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.674806] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-la1_db_1-client-0: changing port to 49152 (from 0)
[2019-03-07 19:18:36.674815] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-la1_db_1-client-1: changing port to 49152 (from 0)
[2019-03-07 19:18:36.674927] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-la1_db_1-client-2: changing port to 49152 (from 0)
[2019-03-07 19:18:36.675012] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.675054] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.675155] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.675203] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.675243] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.675306] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:18:36.675563] I [MSGID: 114046] 
[client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-1: Connected 
to la1_db_1-client-1, attached to remote volume 
'/gluster-bricks/la1_db_1/la1_db_1'.
[2019-03-07 19:18:36.675573] I [MSGID: 108005] 
[afr-common.c:5336:__afr_handle_child_up_event] 0-la1_db_1-replicate-0: 
Subvolume 'la1_db_1-client-1' came back up; going online.
[2019-03-07 19:18:36.675722] I [MSGID: 114046] 
[client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-0: Connected 
to la1_db_1-client-0, attached to remote volume 
'/gluster-bricks/la1_db_1/la1_db_1'.
[2019-03-07 19:18:36.675728] I [MSGID: 114046] 
[client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-2: Connected 
to la1_db_1-client-2, attached to remote volume 
'/gluster-bricks/la1_db_1/la1_db_1'.
[2019-03-07 19:18:36.675743] I [MSGID: 108002] [afr-common.c:5611:afr_notify] 
0-la1_db_1-replicate-0: Client-quorum is met
[2019-03-07 19:18:36.676564] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2019-03-07 19:18:36.676578] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: 
switched to graph 0
[2019-03-07 19:18:36.677623] I [MSGID: 109005] 
[dht-selfheal.c:2342:dht_selfheal_directory] 0-la1_db_1-dht: Directory selfheal 
failed: Unable to form layout for directory /
[2019-03-07 19:20:01.674361] C 
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-0: server 
172.16.0.171:49152 has not responded in the last 42 seconds, disconnecting.
[2019-03-07 19:20:01.674440] C 
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-1: server 
172.16.0.172:49152 has not responded in the last 42 seconds, disconnecting.
[2019-03-07 19:20:01.674458] C 
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-2: server 
172.16.0.174:49152 has not responded in the last 42 seconds, disconnecting.
[2019-03-07 19:20:01.674455] I [MSGID: 114018] 
[client.c:2254:client_rpc_notify] 0-la1_db_1-client-0: disconnected from 
la1_db_1-client-0. Client process will keep trying to connect to glusterd until 
brick's port is available
[2019-03-07 19:20:01.674530] I [MSGID: 114018] 
[client.c:2254:client_rpc_notify] 0-la1_db_1-client-1: disconnected from 
la1_db_1-client-1. Client process will keep trying to connect to glusterd until 
brick's port is available
[2019-03-07 19:20:01.674557] W [MSGID: 108001] [afr-common.c:5618:afr_notify] 
0-la1_db_1-replicate-0: Client-quorum is not met
[2019-03-07 19:20:01.674776] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 
0-la1_db_1-client-0: forced unwinding frame type(GlusterFS 4.x v1) 
op(LOOKUP(27)) called at 2019-03-07 19:18:47.415290 (xid=0x11)
[2019-03-07 19:20:01.674798] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-0: remote 
operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport 
endpoint is not connected]
[2019-03-07 19:20:01.674808] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 
0-la1_db_1-client-1: forced unwinding frame type(GlusterFS 4.x v1) 
op(LOOKUP(27)) called at 2019-03-07 19:18:47.415303 (xid=0x15)
[2019-03-07 19:20:01.674830] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-1: remote 
operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport 
endpoint is not connected]
[2019-03-07 19:20:01.674888] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 
0-la1_db_1-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 
2019-03-07 19:19:19.670189 (xid=0x12)
[2019-03-07 19:20:01.674902] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 
0-la1_db_1-client-0: socket disconnected
[2019-03-07 19:20:01.674916] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 
0-la1_db_1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 
2019-03-07 19:19:19.670216 (xid=0x16)
[2019-03-07 19:20:01.674932] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 
0-la1_db_1-client-1: socket disconnected
[2019-03-07 19:20:01.674935] I [MSGID: 114018] 
[client.c:2254:client_rpc_notify] 0-la1_db_1-client-2: disconnected from 
la1_db_1-client-2. Client process will keep trying to connect to glusterd until 
brick's port is available
[2019-03-07 19:20:01.674947] E [MSGID: 108006] 
[afr-common.c:5413:__afr_handle_child_down_event] 0-la1_db_1-replicate-0: All 
subvolumes are down. Going offline until atleast one of them comes back up.
[2019-03-07 19:20:01.675040] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 
0-la1_db_1-client-2: forced unwinding frame type(GlusterFS 4.x v1) 
op(LOOKUP(27)) called at 2019-03-07 19:18:47.415315 (xid=0x11)
[2019-03-07 19:20:01.675055] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-2: remote 
operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport 
endpoint is not connected]
[2019-03-07 19:20:01.675077] E [MSGID: 101046] 
[dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null
[2019-03-07 19:20:01.675097] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 
0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve (Transport 
endpoint is not connected)
[2019-03-07 19:20:01.675164] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675176] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675179] E [MSGID: 101046] 
[dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null
[2019-03-07 19:20:01.675194] W [fuse-bridge.c:565:fuse_entry_cbk] 
0-glusterfs-fuse: 18: LOOKUP() /foo => -1 (Transport endpoint is not connected)
[2019-03-07 19:20:01.675280] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675282] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be]
 (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 
0-la1_db_1-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 
2019-03-07 19:19:19.670241 (xid=0x12)
[2019-03-07 19:20:01.675308] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 
0-la1_db_1-client-2: socket disconnected
[2019-03-07 19:20:01.675310] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675314] E [MSGID: 101046] 
[dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null
[2019-03-07 19:20:01.675332] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 
0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve (Transport 
endpoint is not connected)
[2019-03-07 19:20:01.675379] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675390] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675393] E [MSGID: 101046] 
[dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null
[2019-03-07 19:20:01.675405] W [fuse-bridge.c:565:fuse_entry_cbk] 
0-glusterfs-fuse: 20: LOOKUP() /foo => -1 (Transport endpoint is not connected)
[2019-03-07 19:20:01.675447] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675470] I [MSGID: 108006] 
[afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up
[2019-03-07 19:20:01.675472] E [MSGID: 101046] 
[dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null
[2019-03-07 19:20:01.675485] W [fuse-bridge.c:899:fuse_attr_cbk] 
0-glusterfs-fuse: 21: LOOKUP() / => -1 (Transport endpoint is not connected)
[2019-03-07 19:20:11.675946] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.675989] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676040] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676101] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676138] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-la1_db_1-client-0: changing port to 49152 (from 0)
[2019-03-07 19:20:11.676261] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676271] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-la1_db_1-client-1: changing port to 49152 (from 0)
[2019-03-07 19:20:11.676293] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676358] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 
0-la1_db_1-client-2: changing port to 49152 (from 0)
[2019-03-07 19:20:11.676455] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-0: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676477] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676571] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676684] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-1: error returned while attempting to connect to host:(null), 
port:0
[2019-03-07 19:20:11.676753] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-la1_db_1-client-2: error returned while attempting to connect to host:(null), 
port:0


This is a 10G network that is otherwise performing flawlessly, and I'm seeing 
no errors on the switches; as mentioned, I've tried this both in our normal 
configuration and with iptables flushed. Not entirely sure what to try next; 
any suggestions?

Thanks,

-j
_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to