I just updated some of my clients to RHEL 7.7, Lustre 2.12.3, MOFED 4.7. Server version is 2.10.8.
I'm now getting errors mounting the filesystem on the client. In fact, I can't even do an 'lctl ping' to any of the servers without getting an I/O error. Debug logs show this message when I attempt an lctl ping: 00000800:00020000:0.0:1581538955.090767:0:20471:0:(o2iblnd.c:941:kiblnd_create_conn()) Can't create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2, recv_sge: 1 # lctl list_nids 10.11.80.65@o2ib3 # lctl ping 10.11.80.50@o2ib3 failed to ping 10.11.80.50@o2ib3: Input/output error Interestingly, if I do an 'lctl ping' to the client _from_ the server, the ping succeeds, and from that point on pings from client _to_ server work fine until the client is rebooted or lnet is reloaded. ko2iblnd parameters match on clients and servers, namely: options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 Anyone have any thoughts? Thanks, Kevin -- Kevin Hildebrand University of Maryland Division of IT
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
