Dear all,

I faced a problem with a glusterfs volume (pure distributed, _not_ dispersed) 
over RDMA transport.  One user had a directory with a large number of files 
(50,000 files) and just doing an "ls" in this directory yields a "Transport 
endpoint not connected" error. The effect is, that "ls" only shows some files, 
but not all. 

The respective log file shows this error message:

[2018-05-20 20:38:25.114978] W [MSGID: 114031] 
[client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: remote 
operation failed [Transport endpoint is not connected]
[2018-05-20 20:38:27.732796] W [MSGID: 103046] 
[rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer 
(10.100.245.18:49153), couldn't encode or decode the msg properly or write 
chunks were not provided for replies that were bigger than 
RDMA_INLINE_THRESHOLD (2048)
[2018-05-20 20:38:27.732844] W [MSGID: 114031] 
[client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: remote 
operation failed [Transport endpoint is not connected]
[2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] 
0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not connected)

I already set the memlock limit for glusterd to unlimited, but the problem 
persists. 

Only going from RDMA transport to TCP transport solved the problem.  (I'm 
running the volume now in mixed mode, config.transport=tcp,rdma).  Mounting 
with transport=rdma shows this error, mouting with transport=tcp is fine.

however, this problem does not arise on all large directories, not on all. I 
didn't recognize a pattern yet. 

I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . 

Is this a known issue with RDMA transport?

best wishes,
Stefan

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to