Do you have the route to o2ib via 10.215.25.76@o2ib2 defined on the client?
Chris Horn From: lustre-discuss <[email protected]> on behalf of Kumar, Amit via lustre-discuss <[email protected]> Date: Wednesday, April 5, 2023 at 12:28 PM To: [email protected] <[email protected]> Subject: [lustre-discuss] Lnet config serving multiple routers and clients Dear Lustre team, Below lustre server(showing one of many) is already serving current file system via lnet routers over o2ib1 to another cluster; Now we are adding a new router to serve another new cluster and its clients over o2ib2; Apparently, I can communicate via ping to lnet router’s both o2ib and o2ib2 NIDs; Likewise, from client I can ping both o2ib and o2ib2 NIDs on lnet router. But end to end communication between client and server cannot find route to each other. Initially I thought it me be related to LU-11641, given I can access both NIDs on the immediate peer I am guessing it is something in my config. I wanted to see if a second set of eyes could point out what could I be doing wrong. Any idea? Server is @ lustre-2.12.5-1.el7.x86_64 on CentOS7.8; Lnet router is @ lustre-client-2.12.5-1.el7.x86_64 on CentOS7.8 Client is @lustre-client-2.14.0 on ****Ubuntu 22.04***; Server(10.212.14.9@o2ib) Lnet router(10.212.1.11@o2ib & 10.215.25.76@o2ib2) Client (10.215.25.74@o2ib2) ********** Server: ****************** # lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib local NI(s): - nid: 10.212.14.9@o2ib status: up interfaces: 0: ib0 # lnetctl route show route: - net: o2ib1 gateway: 10.212.15.16@o2ib - net: o2ib1 gateway: 10.212.16.20@o2ib - net: o2ib2 gateway: 10.212.1.11@o2ib # lnetctl ping 10.212.1.11@o2ib ping: - primary nid: 10.215.25.76@o2ib2 Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib - nid: 10.215.25.76@o2ib2 # lnetctl ping 10.215.25.76@o2ib2 ping: - primary nid: 10.215.25.76@o2ib2 Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib - nid: 10.215.25.76@o2ib2 # lnetctl ping 10.215.25.74@o2ib2 manage: - ping: errno: -1 descr: failed to ping 10.215.25.74@o2ib2: Input/output error ************LNET router **************** # lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib local NI(s): - nid: 10.212.1.11@o2ib status: up interfaces: 0: ib1 - net type: o2ib2 local NI(s): - nid: 10.215.25.76@o2ib2 status: down interfaces: 0: ib0 # lnetctl discover 10.212.1.11@o2ib discover: - primary nid: 10.212.1.11@o2ib Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib - nid: 10.215.25.76@o2ib2 # lnetctl discover 10.215.25.76@o2ib2 discover: - primary nid: 10.212.1.11@o2ib Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib - nid: 10.215.25.76@o2ib2 # lnetctl ping 10.212.14.9@o2ib ping: - primary nid: 10.212.14.9@o2ib Multi-Rail: True peer ni: - nid: 10.212.14.9@o2ib # lnetctl ping 10.215.25.75@o2ib2 ping: - primary nid: 10.215.25.75@o2ib2 Multi-Rail: True peer ni: - nid: 10.215.25.75@o2ib2 # lnetctl peer show peer: - primary nid: 10.215.25.74@o2ib2 Multi-Rail: True peer ni: - nid: 10.215.25.74@o2ib2 state: up - primary nid: 10.212.1.11@o2ib Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib state: up - nid: 10.215.25.76@o2ib2 state: up - primary nid: 10.215.25.75@o2ib2 Multi-Rail: True peer ni: - nid: 10.215.25.75@o2ib2 state: up - primary nid: 10.212.14.9@o2ib Multi-Rail: True peer ni: - nid: 10.212.14.9@o2ib state: up ************* Client ***************** # lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib2 local NI(s): - nid: 10.215.25.74@o2ib2 status: up interfaces: 0: ibp129s0f1 # lnetctl discover 10.212.1.11@o2ib discover: - primary nid: 10.215.25.76@o2ib2 Multi-Rail: True peer ni: - nid: 10.215.25.76@o2ib2 - nid: 10.212.1.11@o2ib # lnetctl ping 10.215.25.76@o2ib2 ping: - primary nid: 10.215.25.76@o2ib2 Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib - nid: 10.215.25.76@o2ib2 # lnetctl ping 10.212.1.11@o2ib ping: - primary nid: 10.215.25.76@o2ib2 Multi-Rail: True peer ni: - nid: 10.212.1.11@o2ib - nid: 10.215.25.76@o2ib2 # lnetctl ping 10.212.14.9@o2ib manage: - ping: errno: -1 descr: failed to ping 10.212.14.9@o2ib: Input/output error root@lnet1:~# lnetctl discover 10.212.14.9@o2ib manage: - discover: errno: -1 descr: failed to discover 10.212.14.9@o2ib: No route to host Thank you, Amit
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
