Stefan, Sounds like a brick process is not running. I have notice some strangeness in my lab when using RDMA, I often have to forcibly restart the brick process, often as in every single time I do a major operation, add a new volume, remove a volume, stop a volume, etc.
gluster volume status <vol> Does any of the self heal daemons show N/A? If that's the case, try forcing a restart on the volume. gluster volume start <vol> force This will also explain why your volumes aren't being replicated properly. On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig <[email protected]> wrote: > Dear all, > > I faced a problem with a glusterfs volume (pure distributed, _not_ > dispersed) over RDMA transport. One user had a directory with a large > number of files (50,000 files) and just doing an "ls" in this directory > yields a "Transport endpoint not connected" error. The effect is, that "ls" > only shows some files, but not all. > > The respective log file shows this error message: > > [2018-05-20 20:38:25.114978] W [MSGID: 114031] > [client-rpc-fops.c:2578:client3_3_readdirp_cbk] > 0-glurch-client-0: remote operation failed [Transport endpoint is not > connected] > [2018-05-20 20:38:27.732796] W [MSGID: 103046] > [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer ( > 10.100.245.18:49153), couldn't encode or decode the msg properly or write > chunks were not provided for replies that were bigger than > RDMA_INLINE_THRESHOLD (2048) > [2018-05-20 20:38:27.732844] W [MSGID: 114031] > [client-rpc-fops.c:2578:client3_3_readdirp_cbk] > 0-glurch-client-3: remote operation failed [Transport endpoint is not > connected] > [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] > 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not > connected) > > I already set the memlock limit for glusterd to unlimited, but the problem > persists. > > Only going from RDMA transport to TCP transport solved the problem. (I'm > running the volume now in mixed mode, config.transport=tcp,rdma). Mounting > with transport=rdma shows this error, mouting with transport=tcp is fine. > > however, this problem does not arise on all large directories, not on all. > I didn't recognize a pattern yet. > > I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . > > Is this a known issue with RDMA transport? > > best wishes, > Stefan > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
