Yes I do but in the same state as server it is down. Also I have set routing to
1 on lnet;
# lnetctl route show -v
route:
- net: o2ib
gateway: 10.215.25.76@o2ib2
hop: -1
priority: 0
health_sensitivity: 1
state: down
type: single-hop
From: Horn, Chris <[email protected]>
Sent: Wednesday, April 5, 2023 12:33 PM
To: Kumar, Amit <[email protected]>; [email protected]
Subject: Re: Lnet config serving multiple routers and clients
[EXTERNAL SENDER]
Do you have the route to o2ib via 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
defined on the client?
Chris Horn
From: lustre-discuss
<[email protected]<mailto:[email protected]>>
on behalf of Kumar, Amit via lustre-discuss
<[email protected]<mailto:[email protected]>>
Date: Wednesday, April 5, 2023 at 12:28 PM
To: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: [lustre-discuss] Lnet config serving multiple routers and clients
Dear Lustre team,
Below lustre server(showing one of many) is already serving current file system
via lnet routers over o2ib1 to another cluster;
Now we are adding a new router to serve another new cluster and its clients
over o2ib2;
Apparently, I can communicate via ping to lnet router's both o2ib and o2ib2
NIDs; Likewise, from client I can ping both o2ib and o2ib2 NIDs on lnet router.
But end to end communication between client and server cannot find route to
each other.
Initially I thought it me be related to LU-11641, given I can access both NIDs
on the immediate peer I am guessing it is something in my config. I wanted to
see if a second set of eyes could point out what could I be doing wrong. Any
idea?
Server is @ lustre-2.12.5-1.el7.x86_64 on CentOS7.8;
Lnet router is @ lustre-client-2.12.5-1.el7.x86_64 on CentOS7.8
Client is @lustre-client-2.14.0 on ****Ubuntu 22.04***;
Server(10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>)
Lnet router(10.212.1.11@o2ib<mailto:10.212.1.11@o2ib> &
10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>)
Client (10.215.25.74@o2ib2<mailto:10.215.25.74@o2ib2>)
********** Server: ******************
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
status: up
interfaces:
0: ib0
# lnetctl route show
route:
- net: o2ib1
gateway: 10.212.15.16@o2ib<mailto:10.212.15.16@o2ib>
- net: o2ib1
gateway: 10.212.16.20@o2ib<mailto:10.212.16.20@o2ib>
- net: o2ib2
gateway: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
# lnetctl ping 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
ping:
- primary nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
# lnetctl ping 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
ping:
- primary nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
# lnetctl ping 10.215.25.74@o2ib2<mailto:10.215.25.74@o2ib2>
manage:
- ping:
errno: -1
descr: failed to ping 10.215.25.74@o2ib2<mailto:10.215.25.74@o2ib2>:
Input/output error
************LNET router ****************
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
status: up
interfaces:
0: ib1
- net type: o2ib2
local NI(s):
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
status: down
interfaces:
0: ib0
# lnetctl discover 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
discover:
- primary nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
# lnetctl discover 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
discover:
- primary nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
# lnetctl ping 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
ping:
- primary nid: 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
Multi-Rail: True
peer ni:
- nid: 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
# lnetctl ping 10.215.25.75@o2ib2<mailto:10.215.25.75@o2ib2>
ping:
- primary nid: 10.215.25.75@o2ib2<mailto:10.215.25.75@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.215.25.75@o2ib2<mailto:10.215.25.75@o2ib2>
# lnetctl peer show
peer:
- primary nid: 10.215.25.74@o2ib2<mailto:10.215.25.74@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.215.25.74@o2ib2<mailto:10.215.25.74@o2ib2>
state: up
- primary nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
state: up
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
state: up
- primary nid: 10.215.25.75@o2ib2<mailto:10.215.25.75@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.215.25.75@o2ib2<mailto:10.215.25.75@o2ib2>
state: up
- primary nid: 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
Multi-Rail: True
peer ni:
- nid: 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
state: up
************* Client *****************
# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib2
local NI(s):
- nid: 10.215.25.74@o2ib2<mailto:10.215.25.74@o2ib2>
status: up
interfaces:
0: ibp129s0f1
# lnetctl discover 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
discover:
- primary nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
# lnetctl ping 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
ping:
- primary nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
# lnetctl ping 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
ping:
- primary nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
Multi-Rail: True
peer ni:
- nid: 10.212.1.11@o2ib<mailto:10.212.1.11@o2ib>
- nid: 10.215.25.76@o2ib2<mailto:10.215.25.76@o2ib2>
# lnetctl ping 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
manage:
- ping:
errno: -1
descr: failed to ping 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>:
Input/output error
root@lnet1:~# lnetctl discover 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>
manage:
- discover:
errno: -1
descr: failed to discover 10.212.14.9@o2ib<mailto:10.212.14.9@o2ib>:
No route to host
Thank you,
Amit
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org