You mentioned that the servers are on the o2ib0 network, but the error messages indicate that the client is trying to communicate with the MDT on the tcp network. The file system configuration needs to be updated to use the updated NIDs.
Doug > On Jul 11, 2016, at 7:34 AM, Jessica Otey <jo...@nrao.edu> wrote: > > All, > I am, as before, working on a small test lustre setup (RHEL 6.8, lustre v. > 2.4.3) to prepare for upgrading at 1.8.9 lustre production system to 2.4.3 > (first the servers and lnet routers, then at a subsequent time, the clients). > Lustre servers have IB connections, but the clients are 1G ethernet only. > > For the life of me, I cannot get the client to mount via the router on this > test system. (Client will mount fine when router is taken out of the > equation.) This is the error I am seeing in the syslog from the mount attempt: > > Jul 11 10:15:37 tlclient kernel: Lustre: > 3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed > out for slow reply: [sent 1468246532/real 1468246532] req@ffff88032a3f9400 > x1539566484848752/t0(0) > o38->tlustre-MDT0000-mdc-ffff88032ad20400@10.7.29.130@tcp:12/10 lens 400/544 > e 0 to 1 dl 1468246537 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 > Jul 11 10:16:07 tlclient kernel: Lustre: > 3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed > out for slow reply: [sent 1468246557/real 1468246557] req@ffff880629819000 > x1539566484848764/t0(0) > o38->tlustre-MDT0000-mdc-ffff88032ad20400@10.7.29.130@tcp:12/10 lens 400/544 > e 0 to 1 dl 1468246567 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 > Jul 11 10:16:37 tlclient kernel: Lustre: > 3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed > out for slow reply: [sent 1468246582/real 1468246582] req@ffff88062a371000 > x1539566484848772/t0(0) > o38->tlustre-MDT0000-mdc-ffff88032ad20400@10.7.29.130@tcp:12/10 lens 400/544 > e 0 to 1 dl 1468246597 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 > Jul 11 10:16:44 tlclient kernel: LustreError: > 2511:0:(lov_obd.c:937:lov_cleanup()) lov tgt 0 not cleaned! deathrow=0, > lovrc=1 > Jul 11 10:16:44 tlclient kernel: Lustre: Unmounted tlustre-client > Jul 11 10:16:44 tlclient kernel: LustreError: > 4881:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount (-4) > > More than one pair of eyes has looked at the configs and confirmed they look > okay. But frankly we've got to be missing something since this should (like > lustre on a good day) 'just work'. > > If anyone has seen this issue before and could give some advice, it'd be > appreciated. One major question I have is whether the problem is a > configuration issue or a procedure issue--perhaps the order in which I am > doing things is causing the failure? The order I'm following currently is: > > 1) unmount/remove modules on all boxes > 2) bring up the lnet modules on the router, and bring up the network > 3) On the mds: add the modules, bring up the network, mount the mdt > 4) On the oss: add the modules, bring up the network, mount the oss > 5) On the client: add the modules, bring up the network, attempt to mount > client (fails) > > Configs follow below. > > Thanks in advance, > Jessica > > tlnet (the router) > [root@tlnet ~]# cat /etc/modprobe.d/lustre.conf > # tlnet configuration > alias ib0 ib_ipoib > alias net-pf-27 ib_sdp > options lnet networks="o2ib0(ib0),tcp0(em1)" forwarding="enabled" > > [root@tlnet ~]# ifconfig #lo omitted > em1 Link encap:Ethernet HWaddr 78:2B:CB:25:A7:E2 > inet addr:10.7.29.134 Bcast:10.7.29.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:453441 errors:0 dropped:0 overruns:0 frame:0 > TX packets:264313 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:436188202 (415.9 MiB) TX bytes:22274957 (21.2 MiB) > ib0 Link encap:InfiniBand HWaddr > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:10.7.129.134 Bcast:10.7.129.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:650 errors:0 dropped:0 overruns:0 frame:0 > TX packets:34 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:256 > RX bytes:75376 (73.6 KiB) TX bytes:2904 (2.8 KiB) > > tlclient (the client) > [root@tlclient ~]# cat /etc/modprobe.d/lustre.conf > options lnet networks="tcp0(em1)" routes="o2ib0 10.7.29.134@tcp0" > live_router_check_interval=60 dead_router_check_interval=60 > > [root@tlclient ~]# ifconfig #lo omitted > em1 Link encap:Ethernet HWaddr 00:26:B9:35:B1:1A > inet addr:10.7.29.132 Bcast:10.7.29.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:2817 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2233 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:354856 (346.5 KiB) TX bytes:328782 (321.0 KiB) > > [root@tlclient ~]# cat /etc/fstab | grep lustre > 10.7.129.130@o2ib0:/tlustre /testlustre lustre > defaults,noauto,user_xattr,flock 0 0 > > tlmds/tloss (mdt and oss) > [root@tloss ~]# cat /etc/modprobe.d/lustre.conf > alias ib0 ib_ipoib > alias net-pf-27 ib_sdp > options lnet networks="o2ib0(ib0)" routes="tcp0 10.7.129.134@o2ib0" > live_router_check_interval="60" dead_router_check_interval="60" > > tloss ifconfig > [root@tloss ~]# ifconfig #lo omitted > em1 Link encap:Ethernet HWaddr 78:2B:CB:4A:7A:F8 > inet addr:10.7.29.131 Bcast:10.7.29.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:7939328 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4920595 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:7016088640 (6.5 GiB) TX bytes:447490407 (426.7 MiB) > ib0 Link encap:InfiniBand HWaddr > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:10.7.129.131 Bcast:10.7.129.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:484688 errors:0 dropped:0 overruns:0 frame:0 > TX packets:62465 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:256 > RX bytes:845062706 (805.9 MiB) TX bytes:919378780 (876.7 MiB) > > tlmds ifconfig > [root@tlmds ~]# ifconfig #lo omitted > em1 Link encap:Ethernet HWaddr 78:2B:CB:28:1D:00 > inet addr:10.7.29.130 Bcast:10.7.29.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:7849519 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4847566 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:7049031324 (6.5 GiB) TX bytes:484594569 (462.1 MiB) > > ib0 Link encap:InfiniBand HWaddr > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:10.7.129.130 Bcast:10.7.129.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:532171 errors:0 dropped:0 overruns:0 frame:0 > TX packets:64114 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:256 > RX bytes:946230130 (902.3 MiB) TX bytes:821297144 (783.2 MiB) > > -- > Jessica Otey > System Administrator II > North American ALMA Science Center (NAASC) > National Radio Astronomy Observatory (NRAO) > Charlottesville, Virginia (USA) > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org