no problem On Wed, 6 Mar 2019 at 12:15, Riccardo Veraldi <[email protected]> wrote:
> On 3/6/19 11:29 AM, Amir Shehata wrote: > > The reason for the load being split across the tcp and o2ib0 for the 2.12 > client, is because the MR code sees both interfaces and realizes it can use > both of them and so it does. > To disable this behavior you can disable discovery on the 2.12 client. I > think that should just get the client to only use the single interface it's > told to. > > thank you very much, this worked out well. > > We're currently working on a feature (UDSP) which will allow the > specification of a "preferred" network. In your case you can set the o2ib > to be the preferred network. It'll always be used unless it becomes > unavailable. You get two benefits this way: 1) your preference is adhered > to. 2) reliability, since the tcp network will be used if the o2ib network > becomes unavailable.this feature > > this feature (UDSP) would e really great. > > > Let me know if disabling discovery on your 2.12 clients work. > > yes after disabling discovery on the client side, the situation is much > better > > > thank you very much > > > > thanks > amir > > On Tue, 5 Mar 2019 at 18:49, Riccardo Veraldi < > [email protected]> wrote: > >> Hello Amir i answer in-line >> >> On 3/5/19 3:42 PM, Amir Shehata wrote: >> >> It looks like the ping is passing. Did you try it several times to make >> sure it always pings successfully? >> >> The way it works is the MDS (2.12) discovers all the interfaces on the >> peer. There is a concept of the primary NID for the peer. That's the first >> interface configured on the peer. In your case it's the o2ib NID. So when >> you do lnetctl net show you'll see Primary NID: <nid>@o2ib. >> >> - primary nid: 172.21.52.88@o2ib >> Multi-Rail: True >> peer ni: >> - nid: 172.21.48.250@tcp >> state: NA >> - nid: 172.21.52.88@o2ib >> state: NA >> - nid: 172.21.48.250@tcp1 >> state: NA >> - nid: 172.21.48.250@tcp2 >> state: NA >> >> On the MDS it uses the primary_nid to identify the peer. So you can ping >> using the Primary NID. LNet will resolve the Primary NID to the tcp NID. As >> you can see in the logs, it never actually talks over o2ib. It ends up >> talking to the peer on its TCP NID, which is what you want to do. >> >> I think the problem you're seeing is caused by the combination of 2.12 >> and 2.10.x. >> From what I understand your servers are 2.12 and your clients are 2.10.x. >> >> my clients are 2.10.5 but this problem arise also with one client 2.12.0, >> anyway the combination of 2.10.0 clients and 2.12.0 is not working right >> >> >> Can you try disabling dynamic discovery on your servers: >> lnetctl set discovery 0 >> >> I did this on the MDS and OSS. I did not disable discovery on the client >> side. >> >> now on the MDS side lnetctl peer show looks right. >> >> Anyway on the client side where I have both IB and tcp if I write on the >> lustre filesystem (OSS) what hapens is that the write operation is >> splitte/load balanced between IB and tcp (Ethernet) and I do not want this. >> I would like that only IB would be used when the client writes data to the >> OSS. but both peer ni (o2ib,tcp) are seen from the 2.12.0 client and >> traffic goes to both of them thus reducing performances because IB is not >> fully used. This does not happen with 2.10.5 client writing on the same >> 2.12.0 OSS >> > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
