+Amar, +Rafi - Other maintainers and Peers of transport/rdma * Can you attach logs from client and brick? Please set diagnostics.client-log-level and diagnostics.brick-log-level to TRACE before starting your tests. * Does fuse client recover from hang?
I think we might not be handling the poll_err path correctly. The fact that we see issues only after brick reboots we are seeing the issues, makes me suspect the error path. regards, Raghavendra On Wed, Apr 25, 2018 at 6:05 PM, Necati E. SISECI <[email protected]> wrote: > Thank you for your mail. > > ibv_rc_pingpong seems working between servers and client. Also udaddy, > ucmatose, rping etc are working. > > root@gluster1:~# ibv_rc_pingpong -d mlx5_0 -g 0 > local address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID > fe80::ee0d:9aff:fec0:1dc8 > remote address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID > fe80::ee0d:9aff:fec0:1b14 > 8192000 bytes in 0.01 seconds = 7964.03 Mbit/sec > 1000 iters in 0.01 seconds = 8.23 usec/iter > > root@cinder:~# ibv_rc_pingpong -g 0 -d mlx5_0 gluster1 > local address: LID 0x0000, QPN 0x00014c, PSN 0x09402b, GID > fe80::ee0d:9aff:fec0:1b14 > remote address: LID 0x0000, QPN 0x0001e4, PSN 0x10090e, GID > fe80::ee0d:9aff:fec0:1dc8 > 8192000 bytes in 0.01 seconds = 8424.73 Mbit/sec > 1000 iters in 0.01 seconds = 7.78 usec/iter > > > Thank you. > > Necati. > > > On 25-04-2018 12:27, Raghavendra Gowdappa wrote: > > Is infiniband itself working fine? You can run tools like ibv_rc_pingpong > to find out. > > On Wed, Apr 25, 2018 at 12:23 PM, Necati E. SISECI <[email protected]> > wrote: > >> Dear Gluster-Users, >> >> I am experiencing RDMA problems. >> >> I have installed Ubuntu 16.04.4 running with 4.15.0-13-generic kernel, >> MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64 to 4 different servers. >> All of them has Mellanox ConnectX-4 LX dual port NICs. These four servers >> are connected via Mellanox SN2100 Switch. >> >> I have installed GlusterFS Server v3.10 (from Ubuntu PPA) to 3 servers. >> These 3 boxes are running as gluster cluster. Additionally, I have >> installed Glusterfs Client to the last one. >> >> I have created Gluster Volume with this command: >> >> # gluster volume create db transport rdma replica 3 arbiter 1 >> gluster1:/storage/db/ gluster2:/storage/db/ cinder:/storage/db force >> >> (network.ping-timeout is 3) >> >> Then I have mounted this volume using mount command below. >> >> mount -t glusterfs -o transport=rdma gluster1:/db /db >> >> After mountings "/db", I can access the files. >> >> The problem is, when I reboot one of the cluster nodes, fuse client gives >> this error below and hangs. >> >> [2018-04-17 07:42:55.506422] W [MSGID: 103070] >> [rdma.c:4284:gf_rdma_handle_failed_send_completion] >> 0-rpc-transport/rdma: *send work request on `mlx5_0' returned error >> wc.status = 5, wc.vendor_err = 245, post->buf = 0x7f8b92016000, wc.byte_len >> = 0, post->reused = 135* >> >> When I change transport mode from rdma to tcp, fuse client works well. No >> hangs. >> >> I also tried Gluster 3.8, 3.10, 4.0.0 and 4.0.1 (from Ubuntu PPAs) on >> Ubuntu 16.04.4 and Centos 7.4. But results were the same. >> >> Thank you. >> Necati. >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
