Hi,

Is anyone else having problems with clients running the latest 3.10.0-862.11.6.el7 kernel unable to mount lustre over OPA?  We've got 2.10.4 lustre clients on Centos 7.5, running the OS-provided OPA software packages.  The servers are running lustre 2.7.16.11.

Attempting to mount fails with errors:
kernel: LNet: Using FMR for registration
kernel: LNet: Added LNI 10.112.0.130@o2ib [128/2048/0/180]
kernel: Lustre: 1872:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1534426797/real 1534426797]  req@ffff8ca9059f0000 x1608963113091088/t0(0) o250->MGC10.112.0.11@[email protected]@o2ib:26/25 lens 520/544 e 0 to 1 dl 1534426802 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 kernel: LustreError: 1632:0:(mgc_request.c:251:do_config_log_add()) MGC10.112.0.11@o2ib: failed processing log, type 1: rc = -5 kernel: LustreError: 1903:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 kernel: Lustre: 1872:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1534426822/real 1534426822]  req@ffff8cb1109b0000 x1608963113091152/t0(0) o250->MGC10.112.0.11@[email protected]@o2ib:26/25 lens 520/544 e 0 to 1 dl 1534426827 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 LustreError: 15c-8: MGC10.112.0.11@o2ib: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
kernel: Lustre: Unmounted lustre-client
mount: mount.lustre: mount 10.112.0.11@o2ib0:10.112.0.12@o2ib0:/lustre at /mnt/fastdata failed: Input/output error
mount: Is the MGS running?
kernel: LustreError: 1632:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-5)

Mounting lustre on clients running kernel 3.10.0-862.11.6.el7 over normal ethernet works fine.

Downgrading the kernel packages to 3.10.0-862.9.1.el7 allows the clients to mount over OPA.

Omni-path itself looks fine - ipoib is working, server addresses are pingable etc.  opainfo shows link status is OK, and IMB test jobs run OK.

Would be helpful to know if anyone else with OPA is also seeing problems, or if it's just a problem with our setup......

Cheers,

Anthony.

--
Dr Anthony Brookfield
Research Computing Infrastructure Specialist
CiCS, University of Sheffield.

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to