Hi all, I am having issue with the Lustre client pinging the server using o2ib.I want to find out if anyone has a suggestion on what could be the problem. Thanks in advance.
lustre client pinging to server: [root@n0 ~]# lctl ping 192.168.13.8@o2ib failed to ping 192.168.13.8@o2ib: Input/output error <<<<<<< lustre client pinging to server over IPoIB works: [root@n0~]# ping -c 1 192.168.13.8 PING 192.168.13.8 (192.168.13.8) 56(84) bytes of data. 64 bytes from 192.168.13.8: icmp_seq=1 ttl=64 time=0.376 ms lustre client pinging to self or other client works: [root@n0 ~]# lctl ping 192.168.13.54@o2ib 12345-0@lo 12345-192.168.13.54@o2ib lustre client pinging to self or otover IPoIB works: [root@n0~]# ping -c 1 192.168.13.54 PING 192.168.13.54 (192.168.13.54) 56(84) bytes of data. 64 bytes from 192.168.13.54: icmp_seq=1 ttl=64 time=0.017 ms The lustre server and client have specified the modprobe for lnet: /etc/modprobe.conf options lnet networks=o2ib(ib0) The client reports some error when trying to ping or mount from the client to server: modprobe lustre lnet lctl ping 192.168.13.8@o2ib mount -v -t lustre 192.168.13.8@o2ib:/zfs /mnt/zfs [root@n0 ~]# dmesg|tail [589805.093447] Lustre: Lustre: Build Version: 2.11.54 [589805.272652] LNet: Using FastReg for registration [589805.275954] LNet: Added LNI 192.168.13.54@o2ib [8/256/0/180] [589813.278370] LNet: 22357:0:(o2iblnd_cb.c:3320:kiblnd_check_conns()) Timed out tx for 192.168.13.186@o2ib: 589813 seconds [589835.518404] LustreError: 22463:0:(mgc_request.c:251:do_config_log_add()) MGC192.168.13.8@o2ib: failed processing log, type 1: rc = -5 [589843.118385] LustreError: 22488:0:(mgc_request.c:601:do_requeue()) failed processing log: -5 [589866.718389] LustreError: 15c-8: MGC192.168.13.8@o2ib: The configuration from log 'zfs-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [589866.741623] Lustre: Unmounted zfs-client [589867.278516] LustreError: 22463:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount (-5) server reports some error during mounting: [root@license ~]# Sep 4 07:26:56 license kernel: LNet: 25518:0:(o2iblnd_cb.c:2475:kiblnd_passive_connect()) Can't accept conn from 192.168.13.54@o2ib (version 12): max_frags 16 incompatible without FMR pool (256 wanted) The lustre server setup: [root@license ~]# lfs df -h UUID bytes Used Available Use% Mounted on zfs-MDT0000_UUID 863.4M 7.5M 853.9M 1% /mnt/zfs[MDT:0] zfs-OST0000_UUID 1.7T 10.0G 1.7T 1% /mnt/zfs[OST:0] filesystem_summary: 1.7T 10.0G 1.7T 1% /mnt/zfs server: RHEL 7.5 (3.10.0-862.el7.x86_64), MLNX_OFED_LINUX-4.4-2.0.7.0, lustre 2.11.54 client: RHEL 7.5 (4.14.0-49.el7a.aarch64), MLNX_OFED_LINUX-4.4-2.0.7.0 , lustre 2.11.54 Regards, - Pak
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
