Hello gluster users and professionals,

We are running gluster 3.10.10 distributed volume (9 nodes) using RDMA transport.

From time to time applications crash with I/O errors (can't access file) and in the client logs we can see messages like:

[2018-05-04 10:00:43.467490] W [MSGID: 114031] [client-rpc-fops.c:2640:client3_3_readdirp_cbk] 0-gv0-client-2: remote operation failed [Transport endpoint is not connected] [2018-05-04 10:00:43.467585] W [MSGID: 103046] [rdma.c:3603:gf_rdma_decode_header] 0-rpc-transport/rdma: received a msg of type RDMA_ERROR [2018-05-04 10:00:43.467601] W [MSGID: 103046] [rdma.c:4055:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (192.168.2.104:49152), couldn't encode or decode the msg properly or write chunks were not provided for replies that were bigger than RDMA_INLINE_THRESHOLD (2048)

At the same time on gluster nodes in brick logs:
[2018-05-04 10:00:43.468470] W [MSGID: 103027] [rdma.c:2498:__gf_rdma_send_reply_type_nomsg] 0-rpc-transport/rdma: encoding write chunks failed

The gluster volume is mounted with options "backupvolfile-server=cn03-ib,transport=rdma,log-level=WARNING"


The same applications run perfectly on not gluster FS. Could you please help to debug and fix this?




# gluster volume status gv0
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y 3984 Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y 3352 Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y 3333 Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y 3079 Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y 3093 Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y 3148 Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y 2995 Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y 3107 Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y 3014

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

# gluster volume info gv0

Volume Name: gv0
Type: Distribute
Volume ID: 5ee4b6a4-b8d2-4795-919f-c992b95d6221
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: rdma
Bricks:
Brick1: cn01-ib:/gfs/gv0/brick1/brick
Brick2: cn02-ib:/gfs/gv0/brick1/brick
Brick3: cn03-ib:/gfs/gv0/brick1/brick
Brick4: cn04-ib:/gfs/gv0/brick1/brick
Brick5: cn05-ib:/gfs/gv0/brick1/brick
Brick6: cn06-ib:/gfs/gv0/brick1/brick
Brick7: cn07-ib:/gfs/gv0/brick1/brick
Brick8: cn08-ib:/gfs/gv0/brick1/brick
Brick9: cn09-ib:/gfs/gv0/brick1/brick
Options Reconfigured:
performance.cache-size: 1GB
server.event-threads: 8
client.event-threads: 8
cluster.nufa: on
performance.readdir-ahead: on
performance.parallel-readdir: on
nfs.disable: on





--
Best regards,
Anatoliy
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to