Can you share the client’s cpt configuration?

$ lctl get_param cpu_partition_table cpu_partition_distance

Chris Horn

From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Gwen Dawes via lustre-discuss <lustre-discuss@lists.lustre.org>
Date: Wednesday, February 14, 2024 at 11:19 AM
To: lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] LNet Multi-Rail config
Hi Chris,

Thanks for the pointers - apologies about kicking up an old thread, but
I'm running out of ideas for how to solve this one.

I switched everything out to 2.15.4 and carefully documented the build
process, then turned one of my machines into a VM host with PCI
passthrough to eliminate any NUMA issues and additional complexity.

So now I have a much simpler layout - one client, multi-rail, four
interfaces, with four servers (1 interface each), one of which put
aside for coordinating lnet_selftest runs. Everything sees everything
else as a Multi-Rail peer, with 1 interface for my servers and 4 for my
client.

My client has two CPTs set up - with the cards on each CPU set up into
differing 'dev cpt' numbers - 0 and 1 - as per their bus.

Running an lnet selftest of 'read', though (concurrency 32, simple
check) - even with distribute 1:3 set, I always see exactly two
interfaces in use. I can block them with net_drop_add which eventually
forces the traffic off to the others, but it only ever seems to use two
interfaces.

Is this a bug with lnet_selftest? Some kind of non-network bottleneck?

Am I misunderstanding CPTs?

Gwen


On Wed, 2024-01-17 at 17:53 +0000, Horn, Chris wrote:
> NRS only affects Lustre traffic, so it will not factor into
> lnet_selftest (LST) results.
>
> I gave some talks on troubleshooting multi-rail that you may want to
> review.
> Overview:
> https://youtu.be/j3m-mznUdac?feature=shared<https://youtu.be/j3m-mznUdac?feature=shared>
> Demo:
> https://youtu.be/TLN56cw9Zgs?feature=shared<https://youtu.be/TLN56cw9Zgs?feature=shared>
>
> You should probably start by verifying that the client and server see
> each other as multi-rail peers, and by checking the send and receive
> counts for each interface on your client and server to ensure that
> traffic is being spread across them.
>
> Chris Horn
>
> From:lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
> behalf of Gwen Dawes via lustre-discuss
> <lustre-discuss@lists.lustre.org>
> Date: Wednesday, January 17, 2024 at 5:48 AM
> To: lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] LNet Multi-Rail config - with BODY!
> Hi Andreas,
>
> Thanks for the pointer. I have a second server set up running 2.15.3
> as
> well specifically to check this, and can set it up with
> lnet_selftest,
> same as the client. After taking a bit to convince the fabric manager
> to accept the moved IPs, I get the exact same results between the
> two.
>
> Good to know that it is possible, though - I wonder what needs to be
> modified to achieve that. It's completely stock - the UDSP is just
> blank, and the default NRS config is in play.
>
> I don't suppose there's any chance the NRS config is what I'm
> missing?
>
> Gwen
>
> On Wed, 2024-01-17 at 03:14 +0000, Andreas Dilger wrote:
> > Hello Gwen,
> > I'm not a networking expert, but it seems entirely possible that
> > the
> > MR discovery in 2.12.9
> > isn't doing as well as what is in 2.15.3 (or 2.15.4 for that
> > matter).
> >  It would make more sense
> > to have both nodes running the same (newer) version before digging
> > too deeply into this.
> >
> > We have definitely seen performance > 1 IB interface from a single
> > node in our testing,
> > though I can't say if that was done with lnet_selftest or with
> > something else.
> >
> > Cheers, Andreas
> >
> > > On Jan 16, 2024, at 08:14, Gwen Dawes via lustre-discuss
> > > <lustre-discuss@lists.lustre.org> wrote:
> > >
> > > Hi folks,
> > >
> > > Let's try that again.
> > >
> > > I'm in the luxury position of having four IB cards I'm trying to
> > > squeeze the most performance out of for Lustre I can.
> > >
> > > I have a small test setup - two machines - a client (2.12.9) and
> > > a
> > > server (2.15.3) with four IB cards each. I'm able to set them up
> > > as
> > > Multi-Rail and each one can discover the other as such. However,
> > > I
> > > can't seem to get lnet_selftest to give me more speed than a
> > > single
> > > interface, as reported by ib_send_bw.
> > >
> > > Am I missing some config here? Is LNet just not capable of doing
> > > more
> > > than one connection per NID?
> > >
> > > Gwen
> > > _______________________________________________
> > > lustre-discuss mailing list
> > > lustre-discuss@lists.lustre.org
> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Lustre Principal Architect
> > Whamcloud
> >
> >
> >
> >
> >
> >
> >
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to