On 3/6/19 11:29 AM, Amir Shehata wrote:
The reason for the load being split across the tcp and o2ib0 for the
2.12 client, is because the MR code sees both interfaces and realizes
it can use both of them and so it does.
To disable this behavior you can disable discovery on the 2.12 client.
I think that should just get the client to only use the single
interface it's told to.
thank you very much, this worked out well.
We're currently working on a feature (UDSP) which will allow the
specification of a "preferred" network. In your case you can set the
o2ib to be the preferred network. It'll always be used unless it
becomes unavailable. You get two benefits this way: 1) your preference
is adhered to. 2) reliability, since the tcp network will be used if
the o2ib network becomes unavailable.this feature
this feature (UDSP) would e really great.
Let me know if disabling discovery on your 2.12 clients work.
yes after disabling discovery on the client side, the situation is much
better
thank you very much
thanks
amir
On Tue, 5 Mar 2019 at 18:49, Riccardo Veraldi
<[email protected] <mailto:[email protected]>>
wrote:
Hello Amir i answer in-line
On 3/5/19 3:42 PM, Amir Shehata wrote:
It looks like the ping is passing. Did you try it several times
to make sure it always pings successfully?
The way it works is the MDS (2.12) discovers all the interfaces
on the peer. There is a concept of the primary NID for the peer.
That's the first interface configured on the peer. In your case
it's the o2ib NID. So when you do lnetctl net show you'll see
Primary NID: <nid>@o2ib.
- primary nid: 172.21.52.88@o2ib
Multi-Rail: True
peer ni:
- nid: 172.21.48.250@tcp
state: NA
- nid: 172.21.52.88@o2ib
state: NA
- nid: 172.21.48.250@tcp1
state: NA
- nid: 172.21.48.250@tcp2
state: NA
On the MDS it uses the primary_nid to identify the peer. So you
can ping using the Primary NID. LNet will resolve the Primary NID
to the tcp NID. As you can see in the logs, it never actually
talks over o2ib. It ends up talking to the peer on its TCP NID,
which is what you want to do.
I think the problem you're seeing is caused by the combination of
2.12 and 2.10.x.
From what I understand your servers are 2.12 and your clients are
2.10.x.
my clients are 2.10.5 but this problem arise also with one client
2.12.0, anyway the combination of 2.10.0 clients and 2.12.0 is not
working right
Can you try disabling dynamic discovery on your servers:
lnetctl set discovery 0
I did this on the MDS and OSS. I did not disable discovery on the
client side.
now on the MDS side lnetctl peer show looks right.
Anyway on the client side where I have both IB and tcp if I write
on the lustre filesystem (OSS) what hapens is that the write
operation is splitte/load balanced between IB and tcp (Ethernet)
and I do not want this. I would like that only IB would be used
when the client writes data to the OSS. but both peer ni
(o2ib,tcp) are seen from the 2.12.0 client and traffic goes to
both of them thus reducing performances because IB is not fully
used. This does not happen with 2.10.5 client writing on the same
2.12.0 OSS
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org