Hello, I am reading your mailing-list since some weeks and I am quiete impressed about the knowledge and shared information here.
We have a gpfs cluster with 4 nsds and 120 clients on Infiniband. Our NSD-Server have two infiniband ports on seperate cards mlx5_0 and mlx5_1. We have RDMA-CM enabled and ipv6 enabled on all nodes. We have added an IPoIB IP to all interfaces. But when we enable the second interface we get the following error from all nodes: 2018-03-12_20:49:38.923+0100: [E] VERBS RDMA closed connection to 10.100.0.83 (hilbert83-ib) on mlx5_1 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 45 2018-03-12_20:49:38.923+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 10.100.0.129 (hilbert129-ib) on mlx5_1 port 1 fabnum 0 vendor_err 129 2018-03-12_20:49:38.923+0100: [E] VERBS RDMA closed connection to 10.100.0.129 (hilbert129-ib) on mlx5_1 port 1 fabnum 0 due to RDMA read error IBV_WC_RETRY_EXC_ERR index 31 2018-03-12_20:49:38.923+0100: [E] VERBS RDMA rdma read error IBV_WC_RETRY_EXC_ERR to 10.100.0.134 (hilbert134-ib) on mlx5_1 port 1 fabnum 0 vendor_err 129 I have read that this issue can happen when verbsRdmasPerConnection is to low. We tried to increase the value and it got better but the problem is not fixed. Current config: minReleaseLevel 4.2.3.0 maxblocksize 16m cipherList AUTHONLY cesSharedRoot /ces ccrEnabled yes failureDetectionTime 40 leaseRecoveryWait 40 [hilbert1-ib,hilbert2-ib] worker1Threads 256 maxReceiverThreads 256 [common] tiebreakerDisks vd3;vd5;vd7 minQuorumNodes 2 verbsLibName libibverbs.so.1 verbsRdma enable verbsRdmasPerNode 256 verbsRdmaSend no scatterBufferSize 262144 pagepool 16g verbsPorts mlx4_0/1 [nsdNodes] verbsPorts mlx5_0/1 mlx5_1/1 [hilbert200-ib,hilbert201-ib,hilbert202-ib,hilbert203-ib,hilbert204-ib,hilbert205-ib,hilbert206-ib] verbsPorts mlx4_0/1 mlx4_1/1 [common] maxMBpS 11200 [common] verbsRdmaCm enable verbsRdmasPerConnection 14 adminMode central Kind regards Philipp Rehs --------------------------- Zentrum für Informations- und Medientechnologie Kompetenzzentrum für wissenschaftliches Rechnen und Speichern Heinrich-Heine-Universität Düsseldorf Universitätsstr. 1 Raum 25.41.00.51 40225 Düsseldorf / Germany Tel: +49-211-81-15557 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
